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Preface 


It was our privilege to serve as the program chairs for CAV 2023, the 35th International 
Conference on Computer-Aided Verification. CAV 2023 was held during July 19-22, 
2023 and the pre-conference workshops were held during July 17—18, 2023. CAV 2023 
was an in-person event, in Paris, France. 

CAV is an annual conference dedicated to the advancement of the theory and practice 
of computer-aided formal analysis methods for hardware and software systems. The 
primary focus of CAV is to extend the frontiers of verification techniques by expanding 
to new domains such as security, quantum computing, and machine learning. This puts 
CAV at the cutting edge of formal methods research, and this year’s program is areflection 
of this commitment. 

CAV 2023 received a large number of submissions (261). We accepted 15 tool 
papers, 3 case-study papers, and 49 regular papers, which amounts to an acceptance 
rate of roughly 26%. The accepted papers cover a wide spectrum of topics, from theo- 
retical results to applications of formal methods. These papers apply or extend formal 
methods to a wide range of domains such as concurrency, machine learning and neu- 
ral networks, quantum systems, as well as hybrid and stochastic systems. The program 
featured keynote talks by Ruzica Piskac (Yale University), Sumit Gulwani (Microsoft), 
and Caroline Trippel (Stanford University). In addition to the contributed talks, CAV 
also hosted the CAV Award ceremony, and a report from the Synthesis Competition 
(SYNTCOMP) chairs. 

In addition to the main conference, CAV 2023 hosted the following workshops: Meet- 
ing on String Constraints and Applications (MOSCA), Verification Witnesses and Their 
Validation (VeWit), Verification of Probabilistic Programs (VeriProP), Open Problems 
in Learning and Verification of Neural Networks (WOLVERINE), Deep Learning-aided 
Verification (DAV), Hyperproperties: Advances in Theory and Practice (HYPER), Syn- 
thesis (SYNT), Formal Methods for ML-Enabled Autonomous Systems (FoOMLAS), and 
Verification Mentoring Workshop (VMW). CAV 2023 also hosted a workshop dedicated 
to Thomas A. Henzinger for this 60th birthday. 

Organizing a flagship conference like CAV requires a great deal of effort from the 
community. The Program Committee for CAV 2023 consisted of 76 members—a com- 
mittee of this size ensures that each member has to review only a reasonable number of 
papers in the allotted time. In all, the committee members wrote over 730 reviews while 
investing significant effort to maintain and ensure the high quality of the conference pro- 
gram. We are grateful to the CAV 2023 Program Committee for their outstanding efforts 
in evaluating the submissions and making sure that each paper got a fair chance. Like 
recent years in CAV, we made artifact evaluation mandatory for tool paper submissions, 
but optional for the rest of the accepted papers. This year we received 48 artifact submis- 
sions, out of which 47 submissions received at least one badge. The Artifact Evaluation 
Committee consisted of 119 members who put in significant effort to evaluate each arti- 
fact. The goal of this process was to provide constructive feedback to tool developers and 
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help make the research published in CAV more reproducible. We are also very grateful 
to the Artifact Evaluation Committee for their hard work and dedication in evaluating 
the submitted artifacts. 

CAV 2023 would not have been possible without the tremendous help we received 
from several individuals, and we would like to thank everyone who helped make CAV 
2023 a success. We would like to thank Alessandro Cimatti, Isil Dillig, Javier Esparza, 
Azadeh Farzan, Joost-Pieter Katoen and Corina Pasareanu for serving as area chairs. 
We also thank Bernhard Krag] and Daniel Dietsch for chairing the Artifact Evaluation 
Committee. We also thank Mohamed Faouzi Atig for chairing the workshop organization 
as well as leading publicity efforts, Eric Koskinen as the fellowship chair, Sebastian 
Bardin and Ruzica Piskac as sponsorship chairs, and Srinidhi Nagendra as the website 
chair. Srinidhi, along with Enrique Roman Calvo, helped prepare the proceedings. We 
also thank Ankush Desai, Eric Koskinen, Burcu Kulahcioglu Ozkan, Marijana Lazic, and 
Matteo Sammartino for chairing the mentoring workshop. Last but not least, we would 
like to thank the members of the CAV Steering Committee (Kenneth McMillan, Aarti 
Gupta, Orna Grumberg, and Daniel Kroening) for helping us with several important 
aspects of organizing CAV 2023. 

We hope that you will find the proceedings of CAV 2023 scientifically interesting 
and thought-provoking! 
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A Flexible Toolchain 
for Symbolic Rabin Games 
under Fair and Stochastic Uncertainties 


Rupak Majumdar!, Kaushik Mallik?@), Mateusz Rychlicki’, 
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Sadegh.Soudjani@newcastle.ac.uk 


Abstract. We present a flexible and efficient toolchain to symbolically 
solve (standard) Rabin games, fair-adversarial Rabin games, and 21/2- 
player Rabin games. To our best knowledge, our tools are the first ones to 
be able to solve these problems. Furthermore, using these flexible game 
solvers as a back-end, we implemented a tool for computing correct- 
by-construction controllers for stochastic dynamical systems under LTL 
specifications. Our implementations use the recent theoretical result that 
all of these games can be solved using the same symbolic fixpoint algo- 
rithm but utilizing different, domain specific calculations of the involved 
predecessor operators. The main feature of our toolchain is the utilization 
of two programming abstractions: one to separate the symbolic fixpoint 
computations from the predecessor calculations, and another one to allow 
the integration of different BDD libraries as back-ends. In particular, we 
employ a multi-threaded execution of the fixpoint algorithm by using the 
multi-threaded BDD library Sylvan, which leads to enormous computa- 
tional savings. 


1 Introduction 


Piterman and Pnueli [17] derived the currently best known symbolic algorithm 
for solving two-player Rabin games over finite graphs with a theoretical com- 
plexity of O(n*+'k!) in time and space, where n is the number of states and k 
is the number of pairs in the winning condition. This work did not provide an 
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implementation. In a series of papers [3,4, 15,16], Mallik et al. showed that this 
symbolic algorithm can be extended to solve different automated design ques- 
tions for reactive hardware, software, and cyber-physical systems under fair or 
stochastic uncertainties. The main contribution of their work is to show that 
these extensions only require a very mild syntactic change of the Piterman- 
Pnueli fixed-point algorithm (with very little effect on its overall complexity) and 
domain-specific realizations of two types of predecessor operators used therein. 

Using this insight, we present a toolchain for the efficient symbolic solution 
of different extensions of Rabin games. We have created three inter-connected 
libraries for solving different parts of the problem from different levels of abstrac- 
tion. The first library, called Genie, offers a set of virtual classes to implement 
the fixpoint algorithm—abstractly, leaving open (i.e. virtual) the predecessor 
computation. Alongside, we created two other libraries, called FairSyn and 
Mascot-SDS, where FairSyn solves fair-adversarial [4] and 21/2-player Rabin 
games [3], while Mascot-SDS solves abstraction-based control problems [15, 16]. 
FairSyn and Mascot-SDS use the optimized fixpoint computation provided by 
Genie, with domain specific implementations of the predecessor operations. 

The flexibility of our toolchain comes from two different programming 
abstractions in Genie. Firstly, Genie offers multiple high-level optimizations for 
solving the Rabin fixpoint, such as parallel execution (requires a thread-safe 
BDD library like Sylvan) and an acceleration technique [13], while abstract- 
ing away from the low-level implementations of the predecessor functions. As a 
result, any synthesis problem using the core Rabin fixpoint of Genie can use 
the optimizations without spending any extra implementation effort. We used 
these optimizations from FairSyn and Mascot-SDS, and achieved remarkable 
computational savings. Secondly, Genie offers easy portability of codes from one 
BDD library to another, which is important as different BDD libraries have dif- 
ferent pros and cons, and the choice of the best library depends on the needs. 
We empirically showed how switching between the two BDD libraries Sylvan 
and CUDD impacts the performance of FairSyn and CUDD: overall, the Sylvan- 
based experiments were significantly faster, whereas the CUDD-based experiments 
consumed considerably lower amount of memory. Using the combined power of 
multi-threaded BDD operations using Sylvan and the optimizations offered by 
Genie, Mascot-SDS was between one and three orders of magnitude faster than 
the state-of-the-art tool in our experiments. 


Comparison with Existing Tools: We are not aware of any available tool to 
directly solve (normal or stochastic) Rabin games symbolically. However, it is well- 
known how to translate stochastic Rabin games into (standard) Rabin games [5], 
and Rabin games into parity games, for which efficient solvers exist, e.g. oink [9]. 
Yet, efficient solutions of stochastic Rabin games via parity games are difficult to 
obtain, because: (i) the translation from a stochastic Rabin game to a Rabin game 
involves a quadratic blow-up, and the translation from a Rabin game to a parity 
game results in an exponential blow-up in the size of the game, (ii) symbolic fix- 
point computations become cumbersome very fast for parity games, as the number 
of vertices and/or colors in the game graph increases, leading to high computa- 
tion times in practice, and (iii) the only known algorithms capable of handling fair 
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and stochastic uncertainties efficiently are all symbolic in nature, while most of 
the efficient parity game solvers are non-symbolic. Additionally, unlike the Rabin 
fixpoint, the nesting of the parity fixpoint does not enable parallel execution. 

While it is well known that for normal parity games, computational tractabil- 
ity can be achieved by different non-symbolic algorithms, such as Zielonka’s 
algorithm [22], tangle learning [8] or strategy-improvement [19], implemented in 
oink [9], it is currently unclear if and how these algorithms allow for the efficient 
handling of fair or stochastic uncertainties. We are therefore unable to compare 
our toolchain to the translational workflow via parity games in a fair manner. 

In the area of temporal logic control of stochastic systems, Mascot-SDS has 
two powerful features: (a) it can handle synthesis for the rich class of omega- 
regular (infinite-horizon) specifications, and (b) it provides both over- and under- 
approximations of the solution, thus enabling a quantitative refinement loop 
for improving the precision of the approximation. The features of Mascot-SDS 
is compared with other tools in the stochastic category of the recent ARCH 
competition (see the report [1] for the list of participating tools). As concluded in 
the report of the competition, other state-of-the-art tools in stochastic category 
are either limited to a fragment of w-regular specifications or do not provide 
any indication of the quality of the involved approximations. The only tool [10] 
that supports w-regular specifications uses a different alternate non-symbolic 
approach, against which Mascot-SDS fares significantly well in our experiments 
(see Sect. 4.2). Even if we leave stochasticity aside, our tool implements a new and 
orthogonal heuristic for multi-threaded computation of Rabin fixpoints, which 
is not considered by other controller synthesis tools [11]. 


2 Theoretical Background 


We briefly state the synthesis problems our toolchain is solving. We follow the 
same (standard) notation for two-player game graphs, winning regions, strategies 
and p-calculus formulas, as in [4]. 


2.1 Solving Rabin Games Symbolically 


Given a game graph G = (V,Vo,Vi, E), a Rabin game is specified using a set 
of Rabin pairs R = {(Q1, Ri,),---,(Qz, Re)}, with Qi, Ri C V for every i € 
[1; k], and y = Vier.) (0HR: AO0Q;) being the Rabin acceptance condition. 
Piterman and Pnueli [17] showed that the winning region of a Rabin game can be 
computed using the p-calculus expression given in (2), where the set transformers 
Cpre : 2V — 2V and Apre : 2V x 2” — 2” are defined for every S,T C V as: 


Cpre(S) := {v € Vo | du’ € S . (v, v’) € E} 
U{ve VY |W EV .(v,v) €E => v €S}, (la) 
Apre(S,T) :=Cpre(T). (1b) 


Fair-Adversarial Rabin Games. A Rabin game is called fair-adversarial 
when there is an additional fairness assumption on a set of edges originating from 
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The symbolic fixpoint algorithm for solving Rabin games with R = 
{(Q1, Ri), oo (Qk, Rx)} and K = [1; k]: 


k 
VYpo -HX po: U DN lee n U VYp5-UXpo- «-- U VY p,, -HXp, - p J 42) 
DIES p2€K\ {pit PkEK\{P1 o Pk—1} J=0 


where a, 
Cp; = (ie Rp) N Oz N Cpre(Yp;)) U (Apre(Yp;, Xp; ))] >, 
and the definitions of Cpre and Apre are problem specific. 


Player 1 vertices in G. Let E£ C EN(V,xV) be a given set of edges, called the live 
edges. Given E* and a Rabin winning condition y, we say that Player 0 wins the 
fair-adversarial Rabin game from a vertex v if Player 0 wins the (normal) game 


for the modified winning condition y* := E ov => Oe) => 


y. Based on the results of Banerjee et al. [4], fair-adversarial Rabin games can 
be solved via (2), by defining for every S,T C V 
Cpre(S) = {v € Vo | dv’ € S . (v,v’) € E} 
U{veV, | Wo' eV .(v, v) E€ E = v'e S}, (3a) 
Apre(S,T) := Cpre(T) U {v € Cpre(S) A Vi |w ET . (v, v') € E}. (3b) 


We see that (3) coincides with (1) if E* is empty. 

21/2-Player Rabin Games. A 21/2-player game is played on a game graph 
(V, Vo, Vi, Vr, E), and the only difference from a 2-player game graph is the addi- 
tional set of vertices V, which are called the random vertices. The sets Vi, Vo, 
and V, partition V. Based on the results of [3] 21/2-Player rabin games can be 
solved via (2) by defining for all S,T C V 


Cpre(S) := {v € Vo | Ww € S . (vu, v’) € E} 
U{vEe V,UV,| Vo EV. (v,v')€e ES>v' € S}, (4a) 
Apre(S,T) := Cpre(T) U {v € Cpre(S) NV, | du’ €T . (v,v') € E}. (4b) 


2.2 Computing Symbolic Controllers for Stochastic Dynamical 
Systems 


A discrete-time stochastic dynamical system S is represented using a tuple 
(X,U,W, f), where X C R” is a continuous state space, U is a finite set of 
control inputs, W C R” is a bounded set of disturbances, and f: X x U — X is 
the nominal dynamics. If zë € X and u* € U are the state and control input of 
S at some time k € N, then the state at the next time step is given by: 


okt? = f(a", u) +w", (5) 


where w* is the disturbance at time k which is sampled from W using some 
(possibly unknown) distribution. Without loss of generality we assume that W 
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is centered around the origin, which can be easily achieved by shifting f if needed. 
A path of S originating at 2° € X is an infinite sequence of states 2°x!... for a 
given infinite sequence of control inputs u°u!..., such that (5) is satisfied. 

Let y be a given Rabin specification—called the control objective—defined 
using a finite set of predicates over X. For every controller C: X — U, the 
domain of C, written Dom(C), is the set of states from where the property y 
can be satisfied with probability 1. For a fixed y, a controller C is called optimal 
if Dom(C) contains the domain of every other controller C. The problem of com- 
puting such an optimal controller for the system in (5) is in general undecidable. 
Following [15], we compute an approximate solution instead. 

This approximate solution is obtained by a discretization of the state space. 
For this, we assume that the state space X is a closed and bounded subset 
of the n-dimensional Euclidean space R” for some n > 0, and use the nota- 
tion [[a,b)) to denote the set []j<11,,,)[@i,6i). Now, consider a grid-based dis- 
cretization X of X, where X = {[a,b)) | a,b € R” = X}. One of the key ingre- 
dients of our abstraction process is a function f providing hyper-rectangular 
over-approximation of the one-step reachable set of the nominal dynamics f 
of the system S: for every grid element 7 € X, we have f(Z,u) = [a’,b’)) 2 
{x' € X | ax € 2. 2! = f(x,u)}. The function f is known to be available for a 
wide class of commonly used forms of the function f, and in our implementa- 
tion we assumed that f is mixed-monotone and f is the so-called decomposition 
function (see standard literature for details [7]). 

__ Given the over-approximation of the nominal dynamics obtained through 
f, we define, respectively, the over- and the under-approximation of the per- 
turbed dynamics as 9(%,u) :-= W @ f(Z,u) and g(Z,u) = W © (-f(Z,u)), 
where © and © respectively denote the Minkowski sum and the Minkowski dif- 
ference. Next, we transfer g and g to the abstract state space X to obtain, 
respectively, the over- and the under-approximation in terms of the abstract 


transition function!, i.e., h(@,u) = G EIIE unr 4 o} and h(Z,u) = 


J eX | gu) Na A o}. With h and h available, it was shown by Majumdar 


et al. [16] that the over-approximation of the optimal controller can be solved by 
using the fixpoint algorithm in (2), where the predecessor operators are defined 
for every S,T C X as 


Cpre( S) := fe x |SueU.h(Z,u) C s} (6a) 


Apre(S,T) := {ee X | aucv .RB,u) CSAKEwWATH o}. (6b) 
3 Implementation Details 


We develop three interconnected tools, Genie, FairSyn, and Mascot-SDS, which 
work in close harmony to implement efficient solvers for the solution of (2) with 


1 Here we assume that FG, u) C X; otherwise we need to take some extra steps. 
Details can be found in the work by Majumdar et al. [16]. 
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CUDD [20] Sylvan [21] 
Genie 
CuddUBDD SylvanUBDD 
BaseUBDD 
BaseFixpoint (Virtual BDD class) 
Virtual fixpoint class ii RabinAutomaton 
T Li TI 
oi a E E aeo ee a ee 1 
1 '------------r--------------------- tors 
a i Ts = = SS 7 ; 1 1 
I I i 1 1 
Vv yy Y Y Y 
Fixpoint Fixpoint 


Arena > SymbolicSet —>»>SymbolicMode1—> 


Cpre and Apre Cpre and Apre 
defined as in (3) defined as in (6) 


FairSyn Mascot-SDS 


Fig. 1. A schematic diagram of interaction among the three tools. Each block represents 
one class in the respective tool, and an arrow from class A to class B denotes that B 
depends on A. The dependency within each tool is shown using solid arrows, while the 
dependencies of Mascot-SDS and FairSyn on Genie is shown using dashed arrows. 


pre-operators defined via (3), (4) and (6), respectively. The tools use binary 
decision diagrams (BDD) to symbolically manipulate sets of vertices/states of 
the underlying system, and to manage the BDDs, we offer the flexibility to 
choose between two of the well-known existing BDD libraries, namely CUDD 
[20] and Sylvan [21]. The two libraries have their own merits: while CUDD 
has significantly lower memory footprint, Sylvan offers superior computation 
speed through multi-threaded BDD operations. Thus, the optimal choice of the 
library depends on the size of the problem, the computational time limit, and 
the memory budget, and through our implementation it is possible to choose 
one or the other by, in some cases, changing only a single line of code and, in 
the other cases, changing the value of just one flag. Moreover, we expect that 
integrating other BDD libraries having the same basic BDD operations in our 
tools will be easy and seamless—thanks to the programming abstraction offered 
by Genie. Such extensions will possibly bring more diverse set of computational 
strengths for solving the fundamental synthesis problems that we address. 

The tools are primarily written using C++, with some small python scripts 
implementing parts of visualizations of outputs. The main classes of the three 
tools and their interactions are depicted in Fig. 1. We briefly describe the core 
functionalities of the tools in the following. 


3.1 Genie 


Genie implements the fixpoint algorithm (2) in the class BaseFixpoint through 
two layers of abstraction. One abstraction is through the virtual definitions of 
the Cpre and Apre operators, whose concrete implementations are provided in 
the front-end synthesis tools (in our case FairSyn and Mascot-SDS). Using this 
abstraction, we implemented two different optimizations for the efficient itera- 
tive computation of the Rabin fixpoint in (2)—independently from the actual 
implementations of the Apre and Cpre operators. The first optimization is a 
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multi-threaded computation of the Rabin fixpoint, exploiting the fixpoint’s inher- 
ent parallel structure due to the independence among different sequences of 
(p1,p2,---) used to compute see C,,- The second optimization is an acceler- 
ated computation of the Rabin fixpoint, achieved through bookkeeping of inter- 
mediate values of the BDD variables. The core of the acceleration procedure for 
general u-calculus fixpoints was proposed by Long et al. [13], and the details 
specific to the fixpoint in (2) can be found in the paper by Banerjee et al. [4]. 

The other abstraction in Genie is the set of virtually defined low-level BDD 
operations in the auxiliary class BaseUBDD, which enable us to easily switch 
between different off-the-shelf BDD libraries. The virtual BDD operations in 
BaseUBDD are concretely realized in the classes CuddUBDD and SylvanUBDD, 
which work as interfaces between, respectively, the CUDD and the Sylvan BDD 
libraries. Support for additional BDD libraries can be easily built by creating 
new interface classes. More details on the functionalities of Genie can be found 
in the longer version of this paper [14]. 


3.2 FairSyn 


The core of FairSyn is written as a header-only library, which offers the infras- 
tructure to solve (2) with pre-operators defined via (3) and (4). The main 
component of FairSyn is the class Fixpoint, which derives from the class 
BaseFixpoint from Genie, and implements the concrete definitions of Cpre 
and Apre in (3) and (4). 


How to Use: For computing the winning region and the winning strategy in a 
fair-adversarial Rabin game (resp. a 21/2-player Rabin game) using FairSyn, one 
needs to write a program to create the game as a Fixpoint object. One possible 
way of constructing a Fixpoint object is through a synchronous product of a 
game graph (an object of class Arena) and a specification Rabin automaton (an 
object of class RabinAutomaton) with an input alphabet of sets of nodes of the 
Arena object. Following is a snippet: 


// typedef Genie: :CuddUBDD UBDD; // use this for CUDD 
typedef Genie::SylvanUBDD UBDD; // use this for Sylvan 
UBDD base; 


Arena<UBDD> A(base, vars, nodes, sys_nodes, env_nodes, edges, 
live_edges); // the game graph 

RabinAutomaton<UBDD> R(base, vars, inp_alphabet, filename); // the 
specification automaton 

Fixpoint<UBDD> Fp(base, "under", A, R); // the synchronous product 

// UBDD strategy = Fp.Rabin(true, 20, Fp.nodes_, 0); // sequential 
fixpoint solver 

UBDD strategy = Fp.Rabin(true, 20, Fp.nodes_, 0, 
Genie::ParallelRabinRecurse); // parallel fixpoint solver 
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where vars is a (possibly initially empty) set of integers which will contain the set 
of newly created BDD variables, nodes, sys_nodes, and env_nodes are, respec- 
tively, vectors of indices of various types of vertices, edges and live_edges 
are, respectively, vectors of the respective types of edges, inp_alphabet is a 
std: :map object that maps input symbols of the Rabin automaton to the respec- 
tive BDDs representing sets of nodes in the Arena, and filename is the name 
of the file in which the Rabin automaton is stored (using the standard HOA 
format [2]). The game is solved by calling Fp.Rabin, a member function of the 
Genie: :BaseFixpoint class (see Sect. 3.1). 


3.3 Mascot-SDS 


The core of Mascot-SDS is also written as a header-only library. It is built on 
top of the well-known tool called SCOTS [18], with several classes of Mascot-SDS 
still retaining their original identities from SCOTS, owing to the close similarity of 
the basic uniform grid-based abstraction used in both tools. The main difference 
between the two tools is that Mascot-SDS synthesizes controllers for stochastic 
systems, while SCOTS synthesizes controllers for only non-stochastic systems. 
The two main classes of Mascot-SDS are called SymbolicSet and 
SymbolicModel, which respectively model the abstract spaces obtained through 
uniform grid-based discretizations (like X in Sect.2.2) and the abstract transi- 
tion relations (h and h in Sect. 2.2). The abstract transition relations are com- 
puted using an auxiliary class called SymbolicModelMonotonic (not shown in 
Fig. 1). Notice that we offer the flexibility to use both CUDD and Sylvan while 
creating objects from SymbolicSet and SymbolicModel. A Fixpoint object is 
a child of the class BaseFixpoint from Genie, which is created by taking a 
synchronous product between a SymbolicModel object and a RabinAutomaton 
object specifying the control objective given as user input. The class Fixpoint 
implements the concrete definitions of the Cpre and Apre operator according 
to (6). 
How to Use: For ease of use, we have written a pair of tools called Synthesize 
and Simulate using the library of Mascot-SDS. Synthesize synthesizes con- 
trollers for stochastic dynamical systems whose nominal dynamics is mixed- 
monotone, and Simulate visualizes simulated closed-loop trajectories using the 
synthesized controller. The inputs to Synthesize include the dynamic model of 
the system and the control objective; the latter can be specified either in LTL or 
using a Rabin automaton. To use Synthesize, simply use the following syntax: 


<path-to-Synthesize binary>/Synthesize <path-to-input-file>/<input.cfg> 
<sylvan/cudd flag> 


where the <input.cfg> is an input configuration file containing all the inputs, 
and the <sylvan/cudd flag> is either 1 or 0 depending on whether the parallel 
version using Sylvan is to be run or the sequential version using CUDD. 

Some of the main ingredients in the input.cfg file are: (a) the descrip- 
tion of the dynamical system’s variable spaces (like state space, input space, 
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etc.) including their discretization parameters, (b) the file where the decompo- 
sition function of the nominal dynamics of the system is stored, (c) the abso- 
lute value of maximum disturbance, and (d) the specification either as an LTL 
formula or as the filename where a Rabin automaton is stored (in HOA for- 
mat [2]). The decomposition function is required to be given as a C-compatible 
header file so that Synthesize can link to (use) this function at runtime (see 
the mascot-sds/examples/ directory for examples). When the specification is 
given as a Rabin automaton (over a labeling alphabet of the system states), the 
automaton needs to be stored in a file in the HOA format. Alternatively, an LTL 
specification can be given, along with a mapping between the atomic predicates 
and the states of the system. In that case Synthesize uses Owl [12] to convert 
the LTL specification to a Rabin automaton. 

The output of Synthesize is a folder called data that contains pieces of the 
controller encoded in BDDs and stored in binary files as well as various metadata 
information stored in text files. These files can be processed by Simulate to 
visualize simulated closed-loop trajectories of the system. The usage of Simulate 
is similar to Synthesize: 


<path-to-Simulate binary>/Simulate <path-to-input-file>/<input.cfg> 
<sylvan/cudd flag> 


where the input.cfg file should, in this case, contain information that are 
required to simulate the closed-loop, like simulation time steps, the python script 
that will plot the state space predicates (see the examples), etc. 


4 Examples 


We present experimental results, showcasing practical usability of our tools and 
comparing performances with the state of the art. All the experiments were run 
on a computer with Intel Xeon E7-8857 v2 48 core processor and 1.5 TB RAM. 


4.1 Synthesizing Code-Aware Resource Mangers Using FairSyn 


We consider a case study introduced by Chatterjee et al. [6]. In this exam- 
ple, there are two bounded FIFO queues, namely the broadcast and output 
queues, which interact among each other and transmit and receive data packets 
through a common network. The two queues are implemented using separate 
threads running on a single CPU. For this multi-threaded program, we con- 
sider the problem of synthesizing a code-aware resource manager, whose task is 
to grant different threads accesses to different shared synchronization resources 
(mutexes and counting semaphores). The specification is deadlock freedom across 
all threads at all time while assuming a fair scheduler (scheduling every thread 
always eventually) and fair progress in every thread (i.e., taking every existing 
execution branch always eventually). The resource-manager is code-aware, and 
has knowledge about the require and release characteristics of all threads for 
different resources. This enables us to avoid deadlocks more effectively than the 
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case when the resource-manager does not have access to the code. Chatterjee 
et al. [6] showed that the synthesis problem (of the resource manager) can be 
reduced to the problem of computing the winning strategy in a 2!/2-player game, 
which we solved using FairSyn. 

Table 1 compares the computational resources for the CUDD and Sylvan-based 
implementations of FairSyn; more details can be found in our earlier work [4]. 
It can be observed that the Sylvan-based implementation is significantly faster, 
although it consumes much more memory. 


Table 1. Performance of FairSyn; code-aware resource management benchmark. 


Broadcast | Number | Computation Time | Peak Memory Usage 
and Output | of BDD | (seconds) 


Queue variables 
Capacities 
CUDD Sylvan CUDD Sylvan 

(1, 1) 25 255.33 | 11.40 292 MiB | 671 MiB 
(2, 1) 27 957.99 | 29.20 310 MiB | 681 MiB 
(3, 1) 27 903.01 | 31.13 310 MiB | 973 MiB 
(1, 2) 27 1308.09 | 39.57 315 MiB | 682 MiB 
(1, 3) 27 1249.37 | 41.76 309 MiB | 681 MiB 
(2, 2) 29 5127.93 | 111.62 342 MiB | 685 MiB 
(3, 2) 29 5104.20 | 114.30 339 MiB | 975 MiB 
(2, 3) 29 5644.09 | 118.12 341 MiB | 975 MiB 
(3, 3) 29 6156.57 | 137.56 339 MiB | 975 MiB 


4.2 Synthesizing Controllers for Stochastic Dynamical Systems 
Using Mascot-SDS 


We use Mascot-SDS to synthesize controllers for two different applications. 


A Bistable Switch. First, we compare our tool’s performance against the state- 
of-the-art tool called StochasticSynthesis (abbr. SS) [10] on a benchmark example 
that was proposed by the authors of SS. In this example, there is a 2-dimensional 
nonlinear bistable switch that is perturbed with bounded stochastic noise. There 
are two synthesis problems with two different control objectives: one, a safety 
objective, and, two, a Rabin objective with two Rabin pairs. The model of the 
system and the control objectives can be found in the original paper [10]. 

The tool SS uses graph theoretic techniques to solve the controller synthesis 
problem, which is an alternative approach that is substantially different from our 
symbolic fixpoint based technique. In Table 2, we summarize the performance of 
Mascot-SDS powered by CUDD and Sylvan, alongside the performance of SS. Both 
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Table 2. Performance comparison between Mascot-SDs and StochasticSynthesis (abbre- 
viated as SS) [10] on the bistable switch. Col. 1 shows the specifications and the respec- 
tive numbers of Rabin pairs, Col. 2 shows the approximation error ranges (smaller error 
means more intense computation), Col. 3, 4, and 5 Col. 6, 7, and 8 compare the peak 
memory footprint (as measured using the “time” command) for Mascot-SDS with CUDD, 
Mascot-SDS with Sylvan, and SS respectively. “TO” stands for timeout (5h of cutoff 
time). 


Spec. upper bound on approx. error | Total running time Peak memory footprint 
Mascot-SDS SS [10] Mascot-SDS SS [10] 
CUDD Sylvan CUDD Sylvan 
pı (1 Rabin pair) | 20%-30% lls <2s 278 351 MiB | 79 MiB | 223 MiB 
10%-20% 9s 2s 43s 351 MiB | 105 MiB | 290 MiB 
5%-10% 14s 4s 1h 49 min | 405 MiB | 251 MiB | 25 GiB 
0%-5% 48s 10s TO 553 MiB | 759 MiB | TO 
p2 (2 Rabin pairs) | 20%-30% 21s <2s 21s 324 MiB | 40 MiB | 202 MiB 
10%-20% 26s 2s 25s 371 MiB | 80 MiB |203 MiB 
5%-10% 37s 4s 1min 17s | 436 MiB | 242 MiB | 490 MiB 
0%-5% 2min 24s | 13s TO 573 MiB | 761 MiB | TO 


Table 3. Performance of Mascot-SDS with CUDD 
and Sylvan for the table-serving robot experi- 
ment. 


CUDD Sylvan 
Comp. time | 1h3min | 2min55s Fig. 2. Closed-loop trajecto- 
Peak memory | 673 MiB | 1.1 GiB ries for 100 time steps with 


kitchen (green), table (blue), and 
obstacle (black). (Color figure 
online) 


Mascot-SDS and SS compute controllers whose domains under-approximate the 
optimal controller domains. The second column of Table2 shows a measure of 
the approximation error. For every comparable approximation error bound, both 
versions of Mascot-SDS significantly outperformed SS, both time and memory- 
wise. In fact, Mascot-SDS with Sylvan was at least an order of magnitude faster 
in all instances. This is particularly astonishing, since SS uses a sophisticated 
lazy abstraction refinement technique, whereas Mascot-SDS uses a plain uni- 
form abstraction which is typically computationally expensive. This shows the 
immense potential of our toolchain; we plan to extend Mascot-SDS with lazy 
gridding, an orthogonal optimization, in a future release to make further com- 
putational savings. For Mascot-SDS itself, as expected, Sylvan was significantly 
faster than CUDD. On the other hand, though Sylvan used less memory than 
CUDD in the simpler setups (the ones with more error), the memory requirement 
of Sylvan quickly grew and surpassed that of CUDD for the more complicated 
setup. 
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Table-Serving Robot. We consider the controller synthesis problem for a 
table-serving robot that needs to satisfy the following specification: AOkitchen ^ 
—obtsacle^ (OO request = O>table), where table, kitchen, obstacle, and request 
are predicates over the state space. The robot itself is modeled as the discrete- 
time abstraction of the standard 3-dimensional Dubins vehicle [15] with an addi- 
tional (i.e., 4th) dimension that records if a request, which is controlled by the 
environment, is pending. In Table 3, we summarize the computational resources, 
and, in Fig. 2, we show a simulated closed-loop trajectory that was plotted using 
our tool Simulate. We observe that Sylvan was much faster, but CUDD consumed 
much less memory. 
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Abstract. Probabilistic recurrence relations (PRRs) are a standard for- 
malism for describing the runtime of a randomized algorithm. Given a 
PRR and a time limit x, we consider the tail probability Pr[T > «], i.e., 
the probability that the randomized runtime T of the PRR exceeds xk. 
Our focus is the formal analysis of tail bounds that aims at finding a 
tight asymptotic upper bound u > Pr|[T > «]. To address this problem, 
the classical and most well-known approach is the cookbook method by 
Karp (JACM 1994), while other approaches are mostly limited to deriv- 
ing tail bounds of specific PRRs via involved custom analysis. 

In this work, we propose a novel approach for deriving the com- 
mon exponentially-decreasing tail bounds for PRRs whose preprocess- 
ing time and random passed sizes observe discrete or (piecewise) uni- 
form distribution and whose recursive call is either a single procedure 
call or a divide-and-conquer. We first establish a theoretical approach 
via Markov’s inequality, and then instantiate the theoretical approach 
with a template-based algorithmic approach via a refined treatment of 
exponentiation. Experimental evaluation shows that our algorithmic app- 
roach is capable of deriving tail bounds that are (i) asymptotically tighter 
than Karp’s method, (ii) match the best-known manually-derived asymp- 
totic tail bound for QuickSelect, and (iii) is only slightly worse (with a 
log log n factor) than the manually-proven optimal asymptotic tail bound 
for QuickSort. Moreover, our algorithmic approach handles all examples 
(including realistic PRRs such as QuickSort, QuickSelect, DiameterCom- 
putation, etc.) in less than 0.18, showing that our approach is efficient 
in practice. 
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1 Introduction 


Probabilistic program verification is a fundamental area in formal verification [3]. 
It extends the classical (non-probabilistic) program verification by considering 
randomized computation in a program and hence can be applied to the formal 
analysis of probabilistic computations such as probabilistic models [14], ran- 
domized algorithms [2,9, 28,30], etc. In this line of research, verifying the time 
complexity of probabilistic recurrence relations (PRRs) is an important sub- 
ject [9,30]. PRRs are a simplified form of recursive probabilistic programs and 
extend recurrence relations by incorporating randomization such as randomized 
preprocessing and divide-and-conquer. They are widely used in analyzing the 
time complexity of randomized algorithms (e.g., QuickSort [16], QuickSelect [17], 
and DiameterComputation [26, Chapter 9]). Compared with probabilistic pro- 
grams, PRRs abstract away detailed computational aspects, such as problem- 
specific divide-and-conquer and data-structure manipulations, and include only 
key information on the runtime of the underlying randomized algorithm. Hence, 
PRRs provide a clean model for time-complexity analysis of randomized algo- 
rithms and randomized computations in a general sense. 

In this work, we focus on the formal analysis of PRRs and consider the 
fundamental problem of tail bound analysis that aims at bounding the proba- 
bility that a given PRR does not terminate within a prescribed time limit. In 
the literature, prominent works on tail bound analysis include the following. 
First, Karp proposed a classic “cookbook” formula [21] similar to Master The- 
orem. This method is further improved, extended, and mechanized by follow- 
up works [5,13,30]. While Karp’s method has a clean form and is easy to use 
and automate, the bounds from the method are known to be not tight (see 
e.g. [15,25]). Second, the works [25] and resp. [15] performed ad-hoc custom 
analysis to derive asymptotically tight tail bounds for the PRRs of QuickSort 
and resp. QuickSelect, respectively. These methods require manual effort and do 
not have the generality to handle a wide class of PRRs. 

From the literature, an algorithmic approach capable of deriving tight tail 
bounds over a wide class of PRRs is a major unresolved problem. Motivated by 
this challenge, we have the following contributions to this work: 


— Based on Markov’s inequality, we propose a novel theoretical approach to 
derive exponentially-decreasing tail bounds, a common type for many ran- 
domized algorithms. We further show that our theoretical approach can 
always derive an exponentially-decreasing tail bound at least as tight as 
Karp’s method under mild assumptions. 

— From our theoretical approach, we propose a template-based algorithmic app- 
roach for a wide class of PRRs that have (i) common probability distributions 
such as (piecewise) uniform distribution and discrete probability distributions 
and (ii) either a single call or a divide-and-conquer for the form of the recur- 
sive call. The technical novelties in our algorithm lie in a refined treatment 
of the estimation of the exponential term arising from our theoretical app- 
roach via integrals, suitable over-approximation, and the monotonicity of the 
template function. 
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— Experiments show that our algorithmic approach derives asymptotically 
tighter tail bounds when compared with Karp’s method. Furthermore, the tail 
bounds derived from our approach match the best-known bound for QuickS- 
elect [15], and are only slightly worse by a log log n factor against the optimal 
manually-derived bound for QuickSort [25]. Moreover, our algorithm synthe- 
sizes each of these tail bounds in less than 0.1s and is efficient in practice. 


A limitation of our approach is that we do not consider the transformation 
from a realistic implementation of a randomized algorithm into its PRR repre- 
sentation. However, such a transformation would require examining a diversified 
number of randomization patterns (e.g., randomized divide-and-conquer) in ran- 
domized algorithms and thus is an orthogonal direction. In this work, we focus 
on the tail bound analysis and present a novel approach to address this problem. 
Due to space limitations, we relegate some details in the extended version [29]. 


2 Preliminaries 


Below we present necessary background in probability theory and the tail bound 
analysis problem we consider. 

A probability space is a triple (Q, F, Pr) such that Q is a non-empty set termed 
as the sample space, F is a o-algebra over Q (i.e., a collection of subsets of Q 
that contains the empty set Ø and is closed under complement and countable 
union), and Pr(-) is a probability measure on F (i.e., a function F — [0,1] such 
that Pr(Q) = 1 and for every pairwise disjoint set-sequence A1, Az2,... in F, we 
have that 3,31 Pr(Ai) = Pr (Ujs1 Ai). 

A random variable X from a probability space (Q, F, Pr) is an F-measurable 
function X :Q— R, i.e., for every d € R, we have that {w E | X(w) < d} EF. 
We denote E[X] as its expected value; formally, we have E[X] := f X dPr. 
A discrete probability distribution (DPD) over a countable set U is a function 
n: U — [0,1], such that X ey n(u) = 1. The support of the DPD is defined as 
supp(7) := {u € U | n(u) > 0}. We abbreviate finite-support DPD as FSDPD. 

A filtration of probability space (Q, F, Pr) is an infinite sequence of {Fn }n>o 
of o-algebra over Q such that Fn C Fn4i1 C F for every n > 0. Intuitively, it 
models the information at the n-th step. A discrete-time stochastic process is an 
infinite sequence T = {Xp }n>0 of random variables from the probability space 
(Q,F, Pr). The process I is adapted to a filtration {Fn}n>0 if for all n > 0, 
Xn is F,-measurable. Given a filtration {Fn }n>0, a stopping time is a random 
variable T : 2 — N, such that for every n > 0, {w €Q| rhw) <n} E€ Fy. 

A discrete-time stochastic process [ = {Xn}nen adapted to a filtration 
{Fn }nen is a martingale (resp. supermartingale) if for every n € N, E[|X,,|] < co 
and it holds as. that E[Xn41 | Fn] = Xn (resp. E[Xn4i | Fn] < Xn). Intu- 
itively, a martingale (resp. supermartingale) is a discrete-time stochastic process 
in which for an observer who has seen the values of Xo,...,Xn, the expected 
value at the next step, i.e. E [Xn+1 | Fn], is equal to (resp. no more than) the 
last observed value X,,. Also, note that in a martingale, the observed values for 
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Xo,.--,Xn—1 do not matter given that E[X,41 | Fn] = Xn. In contrast, in a 
supermartingale, the only requirement is that E[Xn+1 | Fn] < Xn and hence 
'(Xn41 | Fn] may depend on Xo, ..-, Xn—1. Also, note that F, might contain 
more information than just the observations of X;’s. 


Example 1. Consider the classical gambler’s ruin: a gambler starts with Yo dol- 
lars of money and bets continuously until he loses all of his money. If the bets are 
unfair, i.e. the expected value of his money after a bet is less than its expected 
value before the bet, then the sequence {Y,}nen, is a supermartingale. In this 
case, Y„ is the gambler’s total money after n bets. On the other hand, if the bets 
are fair, then {Yn }nen, is a martingale. 


We refer to standard textbooks (such as [6,34]) for a detailed treatment of all 
the concepts illustrated above. 


2.1 Probabilistic Recurrence Relations 


In this work, we focus on probabilistic recurrence relations (PRRs) that describe 
the runtime behaviour of a single recursive procedure. Instead of having a direct 
syntax for a PRR, we propose a mini programming language LRec that cap- 
tures a wide class of PRRs that have common probability distributions such 
as (piecewise) uniform distributions and discrete probability distributions, and 
whose recursive call consists of either a procedure call or two procedure calls in 
a divide-and-conquer style. We present the grammar of LRec in Fig. 1. 


(PRR) proc ::= def p(n; cp) = {comm} 
(Command) comm ::= sample v + dist in {body} | @*_, ci: comm; 
(Recursive Body) body ::= pre(expr); invoke call 
(Recurive Call) call ::= p(v); p(size — v) | p(v) | p(size — v) 
(where size is either |4] +c or [7] +c) 
(Distribution) dist ::= uniform(n) | muniform(n) | discrete |... 

(Expression) expr ::=v|v '|Inv|n|Inn|n-* |e 

| expr-++expr | expr—expr | expr x expr 


Fig. 1. The Grammar of LRec 


In the grammar, we have two positive-integer valued variables n,v which 
stand for the input size and the sampled value in the randomization of the passed 
size to the recursive calls of a procedure, respectively. We use b > 0, c, Cp to denote 
integer constants, and use p to denote the name of the single procedure in the 
PRR. We consider arithmetic expressions expr as polynomials over v,v~!,Inv 
and n,n~',Inn (which we call pseudo-polynomials in this work) and common 
probability distributions, including (i) the uniform distribution uniform(n) over 
{0,1,...,n—1}, (ii) the piecewise uniform distribution muniform(n) that returns 
max{i,n—i— 1} where i observes the uniform distribution uniform(n), and (iii) 
any FSDPD (indicated by discrete) whose probabilities and values are constants 
and pseudo-polynomials, respectively. We also support other piecewise uniform 
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distribution, e.g., the distribution that each v € {0,...,n/2} has probability = 
and each v € {n/2+1,...,n— 1} has probability +. 

The nonterminal proc generates the PRR in the form def p(n; cp) = {comm}, 
for which cp is an integer constant as the threshold of recursion, meaning that 
the procedure halts immediately when n < cp, and comm is the function body 
of the procedure. The nonterminal comm generates all statements with one of 
the two forms as follows. 


— A sampling statement (indicated by sample) followed by first a special expres- 
sion pre(expr) that stands for the preprocessing time of expr amount, then 
the recursive calls generated by the nonterminal call. 

— A probabilistic choice in the form pih Ciicomm; where each statement 
comm, is executed with probability c;. 


We restrict the recursive calls to be either a single recursive call p(v) or 
p(size — v), or a divide-and-conquer composed of two consecutive recursive calls 
p(v) and p(size — v), for which we consider a general setting that the relevant 
overall size size is in the form of the input size n divided by some positive integer 
b with possibly an offset c. Choosing b = 1,c = —1 means the normal situation 
that the overall size is n — 1, i.e., removing one element from the original input. 

Given a PRR p, we use func(p) to represent its function body. 

We always assume that the given PRR is well-formed, i.e., every c; in a 
probabilistic choice is within [0,1] and every random passed size (e.g. v, size — v) 
falls in [0,n]. Below, we present two examples for PRRs. 


Example 2 (QuickSelect). Consider the problem of finding the d-th smallest 
element in an unordered array of n distinct elements. A classical randomized 
algorithm for solving this problem is QuickSelect [17] with O(n) expected run- 
ning time. We model the algorithm as the following PRR: 


def p(n; 2) = {sample v — muniform(n) in {pre(n); invoke p(v); }} 


Here, we use p(n; 2) to represent the number of comparisons performed by Quick- 
Select over an input of size n, and v is the variable that captures the size of the 
remaining array that has to be searched recursively. It observes as the value 
max{i,n— 1— i} where the value of į is sampled uniformly from {0,...,n—1}, 
we use muniform(n) to represent this distribution. 


Example 8 (QuickSort). Consider the classical problem of sorting an array of n 
distinct elements. A well-known randomized algorithm for solving this problem 
is QuickSort [16]. We model the algorithm as the following PRR. 


def p(n; 2) = {sample v — uniform(n) in {pre(n); invoke p(v); p(n — 1 — v); }} 


Here, v and n — 1 — v capture the sizes of the two sub-arrays. 


Below we present the semantics of a PRR in a nutshell. Consider a PRR 
generated by LRec with the procedure name p, a configuration o is a pair o = 
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(comm, ñ) where comm represents the current statement to be executed and 
Ñ > Cp is the current value for the variable n. A PRR state n is a triple (a, C, K) 
for which: 


— ø is either a configuration, or halt for the termination of the whole PRR. 
— C > 0 records the cumulative preprocessing time so far. 
— K is a stack of configurations that remain to be executed. 


We use emp to denote an empty stack, and say that a PRR state (ø, C, K) is final 
if K = emp and o = halt. Note that in a final PRR state (halt, C, emp), the value 
C represents the total execution runtime of the PRR. The semantics of the PRR 
is defined as a discrete-time Markov chain whose state space is the set of all PRR 
states and whose transition function P, where P(, p’) is the probability that 
the next PRR state is y’ given the current PRR state is u = ((comm, 7), C, K). 
The probability is determined by the following cases. 


— For final PRR states u, P(u, p) := 1 and P(, p’) := 0 for other w’ # u. This 
means that the PRR stays at termination once it terminates. 

— In the divide-and-conquer case comm = sample v + dist in {pre(e); 
invoke p(v); p(s—v)}, we first sample v from the distribution dist. Then, with 
probability dist(v), we accumulate the preprocessing time e into the cumula- 
tive processing time C. We recursively invoke p(v) and push the remaining 
task p(s — v) into the stack. The probability for the single recursion case is 
defined analogously. The only difference is that there is no need to push some 
recursive call into the stack in the single recursion case. 

— In the case comm = Pr, Ci : comm;, we have that P(u, pi) = c; for each 
1<1i<k for which we have u; := ((comm;,n), C, K). 


With an initial PRR state ((func(p),n*),0,emp) where n* > cp is the input 
size, the Markov chain induces a probability space where the sample space is 
the set of all infinite sequences of PRR states, the o-algebra is generated by all 
cylinder sets over infinite sequences of PRR states, and the probability measure 
is uniquely determined by the transition function P. We refer to [3] for details. 
We use Prp» for the probability measure where n* > cp is the input size. 

We further define the random variable 7 such that for any infinite sequence 
of PRR states p = flo, f1,---,fe,--- with each u, = ((comm, nt), Ct, Ky), 
T(p) equals the first moment that the sequence reaches a final PRR state, i.e., 
T(p) = inf{t | the PRR state p is final}, for which inf Ø = oo. We will always 
ensure that 7 is almost-surely finite, i.e., Pry«(7 < oo) = 1). Note that the 
random cumulative processing time C, in the PRR state u, € p is the total 
execution time of the given PRR. 

We formulate the tail bound analysis over PRRs as follows. Given a time 
limit a - «(n*) symbolic in the initial input n* and the coefficient a, the goal of 
tail bound analysis is to infer an upper bound u(a,n*) symbolic in n* and a 
such that for every input size n* and plausible value for a, we have that 


Pry«[C, > a-K(n*)| < ula, n*). (1) 
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As tails bounds are often evaluated asymptotically, we focus on deriving tight 
u(a,n*) when a,n* are sufficiently large. To compare the magnitude of two tail 
bounds, we follow the straightforward way that first treats a as a fixed constant 
and compares the bounds over n*, and then if the magnitude over n* is identical, 
we take a further comparison over the magnitude on the coefficient a. 


Example 4 (Our result on QuickSelect). Continue with Example 2, suppose the 
user is interested in the tail bound Pr[C; > a-n*], where C; is the running 
time of the QuickSelect algorithm over an array with length n*. Then, Karp’s 
method produces the symbolic tail bound as follows. 


Pr[C; > a-n*] < exp(1.15 — 0.28 - a) 
However, our method can produce the following tail bound. 


Pr[C; > a-n*| < exp(2-a—a-Ina) 


Note that our method produces tail bounds with a better magnitude on a. 


Example 5 (Our result on QuickSort). Continue with Example 3, consider the 
tail bound Pr[C, > a-n*-Inn*], where C; is the running time of QuickSort over 
a length-n* array. Then, Karp’s method produces the symbolic tail bound as: 


Pr[C, > a-n* -Inn*] < exp(0.5 — 0.5- a), 
while our method can produce the bound as: 


Pri[C, > a-n* -Inn*] < exp((4 — a) - Inn*) 


Note that our method produces tail bounds with a better magnitude on n*. 


3 Exponential Tail Bounds via Markov’s Inequality 


In this section, we demonstrate our theoretical approach for deriving exponen- 
tially decreasing tail bounds based on Markov’s inequality. 

Before illustrating our approach, we first translate a PRR in the language 
LRec with the single procedure p into the canonical form as follows. 


p(n; cp) = pre(S(n)); invoke p(size;(n));...; p(size,(n)) (2) 


where (i) S(n) is a random variable related to the input size n that represents the 
randomized pre-processing time and observes a probability distribution result- 
ing from a discrete probability choice of piecewise uniform distributions, and (ii) 
invoke p(size;(n));...; p(size,(m)) is a statement that is either a single recursive 
call p(size;(n)) or a divide-and-conquer p(size;(n)); p(sizez(n)) upon the resolu- 
tion of the randomization. For the latter, we use a random variable r (which is 
either 1 or 2) to represent the number of recursive calls. 
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The translation can be implemented by a straightforward recursive procedure 
Tf(n, Prog) that takes on input a positive integer n (as the input size) and a 
statement Prog (generated by the nonterminal comm) to be processed, Note that 
the procedure Tf(n, Prog) outputs the joint distribution of the random value 
S(n) and the recursive call p(sizey(n));...;p(size-(m)) with randomized input 
size. These random variables may be dependent. 

Our theoretical approach then works directly on the canonical form (2). It 
consists of two major steps to derive an exponentially-decreasing tail bound. In 
the first step, we apply Markov’s inequality and reduce the tail bound analysis 
problem to the over-approximation of the moment generating function E[exp(t- 
C,,)| where C+ is the cumulative pre-processing time defined previously and t > 0 
is a scaling factor that aids the derivation of the tail bound. In the second step, 
we apply Optional Stopping Theorem (a classical theorem in martingale theory) 
to over-approximate the expected value E[exp(t-C;)]. Below we fix an PRR with 
procedure p in the canonical form (2), and a time limit a- k(n*). 

Our first step applies Markov’s inequality. Our approach relies on the well- 
known exponential form of Markov’s inequality below. 


Theorem 1. For every random variable X and any scaling factor t > 0, we 
have that Pr[X > d] < Elexp(t- X)|/exp(t- d). 


The detailed application of Markov’s inequality to tail bound analysis 
requires to choose a scaling factor t := t(a, n) symbolic in a and n. After choos- 
ing the scaling factor, Markov’s inequality gives the following tail bound: 


Pr[C; > a- «(n*)] < Efexp(t(a, n*) - C,)|/exp(t(a,n*)-a-K(n*)). (3) 


The role of the scaling factor t(a,n*) is to scale the exponent in the term 
exp(K(a,n*)), and this is in many cases necessary as a tail bound may not be 
exponentially decreasing directly in the time limit a- K(n*). 

An unsolved part in the tail bound above is the estimation of the expected 
value Elexp(t(a,n*)-C,)]. Our second step over-approximates the expected value 
Zlexp(t(a,n*)-C;)]. To achieve this goal, we impose a constraint on the scaling 
factor t(a,n) and an extra function f(a,n) and show that once the constraint 
is fulfilled, then one can derive an upper bound for E[exp(t(a,n*) - C,)] from 
t(a,n) and f(a,n). The theorem is proved via Optional Stopping Theorem. 
The theorem requires the almost-sure termination of the given PRR, a natural 
prerequisite of exponential tail bound. In this work, we consider PRRs with finite 
termination time that implies the almost-sure termination. 


Theorem 2. Suppose we have functions t, f : [0,00) x N — [0,00) such that 


tlexp(t(a, n) - Ex(n | f))] < exp(t(a,n) - fla,n)) (4) 


for all sufficiently large a,n* > 0 and all cp < n < n*, where 


Ex(n | f) := S(n) + X; f(a, sizei(n)). 
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Then for t.(a,n*) := ming, <n<n* t(a,n), we have that 


s[exp(ts(a,n*) - C;)] < Elexp(t.(a,n") - f(a,n*))). 


Thus, we obtain the upper bound u(a,n*) := exp(t,(a,n*)- (f(a, n*) —a-K(n*))) 
for the tail bound in (1). 


Proof Sketch. We fix a procedure p, and some sufficiently large a and n*. In 
general, we apply the martingale theory to prove this theorem. To construct a 
martingale, we need to make two preparations. 

First, by the convexity of exp(-), substituting t(a,n) with t.(a,n*) in (4) 
does not affect the validity of (4). 

Second, given an infinite sequence of the PRR states p = uo, H1,... in the 
sample space, we consider the subsequence p' = pup, H4,- -- as follows, where we 
represent u; as ((func(p), 74), Cy, K‘). It only contains states that are either final 
or at the entry of p, i.e., comm = func(p). We define 7’ := inf{t : pu, is final}, then 
it is straightforward that C1, = C+. We observe that ui; represents the recursive 
calls of p,. Thus, we can characterize the conditional distribution pj, | H; by 
the transformation function Tf(7, func(p)) as follows. 


— We first draw (S, size, sizeg,r) from Tf(f, func(p)). 

— We accumulate S into the global cost. If there is a single recursion (r = 1), 
we invoke this sub-procedure. If there are two recursive calls, we push the 
second call p(sizez) into the stack and invoke the first one p(size1). 


Now we construct the super-martingale as follows. For each i > 0, we denote 
the stack as K; for u; as (func(p),s;,1)--- (func(p),siq,), where q; is the stack 
size. We prove that another process yo, y1,... that forms a super-martingale, 
where y; := exp (tla n") . (c! + fla, A) + YS fla,si)) ): Note that yo = 
exp(t.(a,n*) + f(a,n*)), and yr = exp(t.(a,n*) - CL) = exp (t.(a,n*) - C7). 
Thus we informally have that E[exp(t.(a,n*)-C,)] = El[y,] < Elyo] = 
exp (t,(a,n*) - f(a, n*)) and the theorem follows. 

It is natural to ask whether our theoretical approach can always find an 
exponential-decreasing tail bound over PRRs. We answer this question by show- 
ing that under a difference boundedness and a monotone condition, the answer 
is yes. We first present the difference boundedness condition (A1) and the mono- 
tone condition (A2) for a PRR A in the canonical form (2) as follows. 


(Al) A is difference-bounded if there exist two real constants M” < M, such that 
for every n > Cp, and every possible value (V,s1,..., Sx) in the support of the 
probability distribution Tf(n, func(p)), we have that 


k 
M' -E[S(n)] < V + (7 E[p(s:)}) — Elp(n)] < M-E[S(n)]. 


i=l 


(A2) A is expected non-decreasing if E[S(n)] does not decrease as n increases. 
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In other words, (A1) says that for any possible concrete pre-processing time V 
and passed sizes 51,..., 5, the difference between the expected runtime before 
and after the recursive call is bounded by the magnitude of the expected pre- 
processing time. (A2) simply specifies that the expected pre-processing time be 
monotonically non-decreasing. 

With the conditions (A1) and (A2), our theoretical approach guarantees a 
tail bound that is exponentially decreasing in the coefficient a and the ratio 
i [p(n*)]/E[S(n*)]. The theorem statement is as follows. 


Theorem 3. Let A be a PRR in the canonical form (2). If A satisfies (A1) and 
(A2), then for any function w : [1,co) > (1,00), the functions f,t given by 


A 
f(a,n) = w(a) -Elp(n)] and lan) = gre 
. = 8(w(a) — 1) 
with Ala) := vM — Mi)? 
fulfill the constraint (4) in Theorem 2. Furthermore, by choosing w(a) := os in 


the functions f,t above and K(a,n*) := a-El[p(n*)], one obtains the tail bound 


2(a—1)? oy 


Pr[C, > aE[p(n*)]] < exp ( a(Mz — M1)? E[S(n*)] 


Proof Sketch. We first rephrase the constraint (4) as 


j [exp (t(a,n) -(S(n) + 0 _ f(a, sizei(n)) — f(a,n)))] <1 


Then we focus on the exponent in the exp(-), by (A1), the exponent is a bounded 
random variable. By further calculating its expectation and applying Hoeffiding’s 
Lemma [18], we obtain the theorem above. 

Note that since E[p(n)] > E[S(n)] when n > cp, the tail bound is at least 
exponentially-decreasing with respect to the coefficient a. This implies that our 
theoretical approach derives tail bounds that are at least as tight as Karp’s 
method when (A1) and (A2) holds. When E[p(n)] is of a strictly greater magni- 
tude than E[S(n)], our approach derives asymptotically tighter bounds. 

Below, we apply the theorem above to prove tail bounds for Quickse- 
lect (Example 2) and Quicksort (Example 3). 


Example 6. For QuickSelect, its canonical form is p(n; 2) = n+p(sizei(n)), where 
size, (n) observes as muniform(n). Solving the recurrence relation, we obtain that 
‘[p(n)| = 4- n. We further find that this PRR satisfies (A1) with two constants 
M’' = —1,M = 1. Note that the PRR satisfies (A2) obviously. Hence, we apply 
Theorem 3 and derive the tail bound for every sufficiently large a: 


_4)2 
Pr[C, > 4-a-n*] < exp (2) . 
a 
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On the other hand, Karp’s cookbook has the tail bound 


Pr(C, > 4-a- n*] < exp (1.15 — 1.12 - a). 


Our bound is asymptotically the same as Karp’s but has a better coefficient. 


Example 7. For QuickSort, its canonical form is p(n;2) = n + p(size1(n)) + 
p(size2(n)), where size (n) observes as muniform(n) and size2(n) =n—1-size;(n). 
Similar to the example above, we first calculate E[p(n)] = 2-n-Inn. Note 
that this PRR also satisfies two assumptions above with two constants M’ = 
—2log2,M = 1. Hence, for every sufficiently large a, we can derive the tail 
bound as follows: 


: —1)? 
Pr[C, >2-a-n*-Inn*] < exp (“et ‘inn’ ) : 
a 


On the other hand, Karp’s cookbook has the tail bound 


Pr[C, >2-a-n*-Inn*] < exp(—a+0.5). 


Note that our tail bound is tighter than Karp’s with a Inn factor. 


From the generality of Markov’s inequality, our theoretical approach can 
handle to general PRRs with three or more sub-procedure calls. However, the tail 
bounds derived from Theorem 3 is still not tight since the theorem only uses the 
expectation and bound of the given distribution. For example, for QuickSelect, 
the tightest known bound exp(—@(a-Ina)) [15], is tighter than that derived 
from Theorem 3. Below, we present an algorithmic approach that fully utilizes 
the distribution information and derives tight tail bounds that can match [15]. 


4 An Algorithmic Approach 


In this section, we demonstrate an algorithmic implementation for our theoretical 
approach (Theorem 2). Our algorithm synthesizes the functions t, f through 
template and a refined estimation on the exponential terms from the inequality 
(4). The estimation is via integration and the monotonicity of the template. 
Below we fix a PRR p(n; cp) in the canonical form (2) and a time limit a- K(n*). 

Recall that to apply Theorem 2, one needs to find functions t, f that sat- 
isfy the constraint (4). Thus, the first step of our algorithm is to have pseudo- 
monomial template for f(a,n) and t(a,n) in the following form: 


flan) := cf af- mn a.n". n (5) 


tla n) := a a” -In a-n" - In” n (6) 


In the template, we have pf, qf, Uf, Uf, Pt, qt, Ut, Ve are given integers, and 
Cf, Ct > 0 are unknown positive coefficients to be solved. For several compatibility 
reasons (see Proposition 1 and 2 in the following), we require that uy, vf > 0 and 
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ut, Ut < 0. We say that the concrete values cy,c; for the unknown coefficients 
Cf, ce > 0 are valid if the concrete functions f,t obtained by substituting CF, C for 
Cf, Cs in the template (5) and (6) satisfy the constraint (4) for every sufficiently 
large a,n* > 0 and all cp < n < n*. 

We consider the pseudo-polynomial template since the runtime behavior 
of randomized algorithms can be mostly captured by pseudo-polynomials. We 
choose monomial templates since our interest is the asymptotic magnitude of 
the tail bound. Thus, only the monomial with the highest degrees matter. 

Our algorithm searches the values for pf, qf, Uf, Uf, Pt, UH, Ut, Ve by an enu- 
meration within a bounded range {—B,...,B}, where B is a manually specified 
positive integer. To avoid exhaustive enumeration, we use the following propo- 
sition to prune the search space. 


Proposition 1. Suppose that we have functions t, f : [0, 00) x N — [0,00) that 
fulfill the constraint (4). Then it holds that (i) 

(pras) < (1,0) and (pi, q+) > (—1,0), and (ii) 

flon) = QBlp(n))), flan) = O(n(n)) and t(a,n) = Arn) for any 
fixed a > 0, where we write (a,b) < (c,d) for the lexicographic order, i.e., 
(a< c)^Ala=c>b<d). 


Proof. Except for the constraint that f(a,n) = Q(E[p(n)]), the other con- 
straints simply ensure that the tail bound is exponentially-decreasing. To see 
why f(a, n) = Q(E[p(n)]), we apply Jensen’s inequality [27] to (4) and obtain 
f(n) > E[Ex(n|f)] = E[S(n) + X; f(size;(n))]. Then we imitate the proof of 
Theorem 2 and derive that f(n) > E[p(n)). 

Proposition 1 shows that it suffices to consider (i) the choice of uy, vy that 
makes the magnitude of f to be within E[p(n)] and «(n), (ii) the choice of 
ut, vg that makes the magnitude of t~! within x(n), and (iii) the choice of 
Pf Of, Pt, qe that fulfills (pr, ge) < (1,0), (pt, qt) > (—1,0). Note that an over- 
approximation of E[p(n)]| can be either obtained manually or derived from auto- 
mated approaches [9]. 


Example 8. Consider the quickselect example (Example 2), suppose we are inter- 
ested in the tail bound Pr[C, > a-n], and we enumerate the eight integers in the 
template from —1 to 1. Since E[p(n)] = 4-n, by the proposition above, we must 
have that (uf, ve) = (1,0), (us, ve) > (—1,0), (pe, q+) > (—1,0), (pe, ar) < (1,0). 
This reduces the number of choices for the template from 1296 to 128, where 
these numbers are automatically generated by our implementation. A choice is 


flan) := cf- a (lna)™! -n and t(a,n):= c lna. nat. 


In the second step, our algorithm solves the unknown coefficients c+, cf in 
the template. Once they are solved, our algorithm applies Theorem 2 to obtain 
the tail bound. In detail, our algorithm computes t,(a,n*) as the minimum of 
t(a,n) over cp < n < n*, and by u+, vs < 0, t.(a,n*) is simply t(a,n*), so that 
we obtain the tail bound u(a,n*) = exp(t(a,n*) - (f(a,n*) — a: K(n*))). 


Example 9. Continue with Example 8. Suppose we have successfully found that 
cf = 2,c¢ = 1 is a valid concrete choice for the unknown coefficients in the 
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template. Then t,.(a,n*) is t(a,n*) = lna- (n*)~1, and we have the tail bound 
u(a,n*) = exp(2:a—a-Ina), which has better magnitude than the tail bound 
by Karp’s method and our Theorem 3 (See Example 6). 


Our algorithm follows the guess-and-check paradigm. The guess procedure 
explores possible values cy, ¢ for cf, c, and invokes the check procedure to verify 
whether the current choice is valid. Below we present the guess procedure in 
Sect. 4.1, and the check procedure in Sect. 4.2. 


4.1 The Guess Procedure Guess( f, t) 


The pseudocode for our guess procedure Guess(f,t) is given in Algorithm 1. In 
detail, it first receives a positive integer M as the doubling and halving number 
(Line 1), then iteratively enumerates possible values for the unknown coefficients 
cf and c; by doubling and halving for M times (Line 3 - Line 4), and finally 
calls the check procedure (Line 5). It is justified by the following theorem. 


Theorem 4. Given the template for f(a,n) and t(a,n) as in (5) and (6), if 
c7,G are valid choices, then (i) for every k > 1, k-cf,G@ remains to be valid, 
and (ii) for every O < k <1, G,k-G& remains to be valid. 


By Theorem 4, if the check pro- 


Algorithm 1: G Proced 
S Z AOS cedure is sound and complete (i.e., 


Input : Template for f(a,n) and 


ilan) as in (5) and (6) CheckCond always terminates and 
Output: cy, C; > 0 for (5) and (6) c7,G@ fulfills the constraint (4) iff 
1 Parameter: M for the maximum steps es, fee 
of doubling and halving. CheckCond(¢7,) returns true), then 
2 Procedure Guess(f, t): the guess procedure guarantees to find 
3 for @ :=1,271,...,27™” do luti a. lfi : hi h 
p for cF := 4, 1,2,...,2M-1 do a solution C7, G (if it exists) when the 
5 if CheckCond(é7, ct) then parameter M is large enough. 
6 Return (CF, Ct) 
Example 10. Continued with Example 8, suppose M = 2, we enumerate CF 
from {4, 1,2}, and Œ from {1, 3, +}. We try every possible combination, and we 


find that CheckCond(2,1) returns true. Thus, we return (2,1) as the result. In 
Sect. 4.2, we will show how to conclude that CheckCond(2, 1) is true. 


4.2 The Check Procedure CheckCond(¢f, G) 


The check procedure takes as input the concrete values ¢7,@ for the unknown 
coefficients in the template, and outputs whether they are valid. It is the most 
involved part in our algorithm due to the difficulty to tackle the validity of the 
constraint (4) that involves the composition of polynomials, exponentiation and 
logarithms. The existence of a sound and complete decision procedure for such 
validity is extremely difficult and is a long-standing open problem [1,33]. 
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To circumvent this difficulty, the check procedure first strengthens the orig- 
inal constraint (4) into a canonical constraint with a specific form, so that a 
decision algorithm that is sound and complete up to any additive error applies. 
Below we fix a PRR with procedure p in the canonical form (2). We also discuss 
possible extensions for the check procedure in Remark 1. 


The Canonical Constraint. We first present the canonical constraint Q(a,n) 
and how to decide the canonical constraint. The constraint is given by (where 
ve means “for all sufficiently large a” or formally Jao.Va > ao) 


Q(a,n) :=V~aNn > cp. >D -exp(fi(a@) + gi(n)) < 1 (7) 


subject to: 


(C1) For each 1 < i < k, we have y; > 0 is a positive constant, f(a) is a 
pseudo-polynomial in a, and g;(n) is a pseudo-polynomial in n. 
(C2) For each 1 <i < k, the exponents for n and Inn in g;(n) are non-negative. 


We use Q;(a,n) to represent the summation term ys qi- exp( fila) + gi(n)) 
in (7). Below we show that this can be checked by the algorithm Decide up to 
any additive error. We present an overview of this algorithm. We also present 
its pseudo-code in Algorithm 2. 

The algorithm Decide requires an external function NegativeLB(P(n)) that 
takes on input a pseudo-polynomial P(n) and outputs an integer T* such that 
P(n) < 0 for every n > T;, or output +00 for the absence of Tž. The idea of this 
function is to apply the monotonicity of pseudo-polynomials. With the function 
NegativeLB(P(n)), the algorithm Decide consists of two steps as follows. 

First, we can change the bound of n from [cp, 00) into [cp, Tn], where Tn is 
a constant, without affecting the soundness and completeness. This is achieved 
by the observation that either: (i) we can conclude Q(a,n) does not hold, or (ii) 
there is an integer Tn such that Qz (a,n) is non-increasing when n > Tn. Hence, 
it suffices only to consider cp < n < Ta. Below we show how to compute Tn by 
case analysis of the limit M; of g;(n) as n — co, for each 1 <i < k. 


— If M;=+o00, then exp(gi(n) + fi(@)) could be arbitrarily large when n — oo. 
As a result, we can conclude that Q(a,n) does not hold. 

— Otherwise, by (C2), either g;(n) is a constant function, or M;=—oo. In both 
cases, gi(n) is non-increasing for every sufficiently large n. More precisely, 
there exists L; such that gi(n) < 0 for every n > Lj, where g/(n) is the 
derivative of g;(n). Moreover, we can invoke NegativeLB(gi(n)) to get Li. 


Finally, we set T, as the maximum of L;’s and cp. 

Second, for every integer cp < n < Th, we substitute n with 7 to eliminate 
n in Q(a,n). Then, each exponent f;(a@) + g;(7%) becomes a pseudo-polynomial 
solely over a. Since we only concern sufficiently large a, we can compute the 
limit Rr for Qr(a,7) as a — oo. We decide based on the limit Rw as follows. 
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— If Ry < 1 for every cp <T < L, we conclude that Q(a,n) holds. 
— If Ry > 1 for some cp <T < L, we conclude that Q(a,n) does not hold to 


ensure soundness. 


Algorithm 2: The Decision procedure for canonical constraints 


Input : A canonical constraint Q(a,n) in the form of (7) 
Output: Decide whether Q(a,n) holds. 
1 Procedure Decide(Q(a,n)): 
// < The first step 


for i:=1,2,...,k do 


2 
3 

4 M; := The limit of g;(n) as n > ov. 

5 if M;i = +20 then 

6 Return False 

7 else 

8 g; (n) := the derivative of gi(n) 

9 Tn := max{T,,, NegativeLB(g/(n))} 

10 for N:=cyp,..., Tn do // < The second step 
11 R:= 

12 for 1:=1,2,...,k do 

13 A := the limit of fi(a) + gi(7%) as a > ov. 

14 if A = +oo then 
15 Return False 

16 else 
17 R:=R+7;- exp(A) 

18 if R > 1 then Return False 


19 Return True 


Algorithm Decide is sound, and complete up to any additive error, as is 
illustrated by the following theorem. 


Theorem 5. Algorithm Decide has the following properties: 


— (Completeness) If Q(a,n) does not hold for infinitely many a and some n > 
Cp, then the algorithm returns false. 

— (Soundness) For every £ > 0, we have that if Qr(a,n) < 1 — e€ for all suffi- 
ciently large a and all n > cp, then the algorithm returns true. 


The Strengthening Procedure. Then we show how to strengthen the con- 
straint (4) into the canonical constraint (7), so that Algorithm Decide applies. 
We rephrase (4) as 


j [exp(t(a, n) - (Sín) + J _ flo, size:(n)) — f(a, n))| <1 (8) 


and consider two functions f, t obtained by substituting the concrete values CF, Ct 
for unknown coefficients into the template (5) and (6). We observe that the joint- 
distribution of the random quantities S(n),r € {1,2} and size;(n),...,size,(n) 
in the canonical form (2) over PRRs can be described by several probabilistic 
branches {c; : By,...,cx : Bg}, which corresponds to the probabilistic choice 
commands in the PRR. Each probabilistic branch B; has a constant probability 
Ci, a deterministic pre-processing time $;(n), a fixed number of subprocedure 
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calls r;, and a probability distribution for the variable v. The strengthening first 
handles each probabilistic branch, and then combines the strengthening results 
of every branch into a single canonical constraint. 

The strengthening of each branch is an application of a set of rewriting rules. 
Intuitively, each rewriting step over-approximates and simplifies the expectation 
term in the LHS of (8). Through multiple steps of rewriting, we eventually obtain 
the final canonical constraint. Below we present the details of the strengthening 
for a single probabilistic branch with the single recursion case. The divide-and- 
conquer case follows a similar treatment, see the extended version for details. 

Consider the single recursion case r = 1 where a probabilistic branch has 
deterministic pre-processing time S(n), distribution dist for the variable v and 
passed size H(v,n) for the recursive call. We have a case analysis on the distri- 
bution dist as follows. 


— Case I: dist isa FSDPD discrete{c : expr,,...,cj, : expr;,,}, where v observes 
as expr; with probability ci. Then the expectation in (8) is exactly: 
k 
S, exp (Ha, n) - S(n) + t(a,n) + f(a, H(expr;,n)) — t(a,n) - f(a,n)) 


Thus it suffices to over-approximate the exponent Xj;(a,n) := t(a,n)- S(n) + 
t(a,n)- f(a, H(expr,,n)) —t(a,n)- f(n) into the form subject to (C1)—(C2). For 
this purpose, our strengthening repeatedly applies the following rewriting rules 
(R1)-(R4) for which 0 <a < 1 and b> 0: 
(R1) f(a, H(expr;,n)) < f(a,n) 
(R2) In(an — b) <lnn+lna In(an +b) < nn+In(min{1,a+ *}) 
p 


n n n b—1 
b PESE b 


(R3) O<n <e O<In'tn<In ‘ec (R4) iF < 


(R1) follows from the well-formedness 0 < H (size;, n) < n and the monotonicity 
of f(a, n) with respect to n. (R2)-(R4) are straightforward. Intuitively, (R1) can 
be used to cancel the term f(a, H(size;,n)) — f(a, n), (R2) simplifies the sub- 
expression in In, (R3) is used to remove floors and ceils, and (R4) to remove n~° 
and In~°n to satisfy the restriction (C2) of the canonical constraint. To apply 
these rules, we consider two strategies below. 


(SLD) Apply (R1) and over-approximate X;(a,n) as t(a,n)-S(n). Then, we 
repeatedly apply (R3) to remove terms n~° and In “n. 

(S2-D) Substitute f and t with the concrete functions f,f and expand 
H(expr;,n). Then we first apply (R4) to remove all floors and ceils, and 
repeatedly apply (R2) to replace all occurrences of In(an+b) with lIn n+1n C 
for some constant C. By the previous replacement, the whole term X;(a,7) 
will be over-approximated as a pseudo-polynomial over a and n. Finally, we 


eagerly apply (R3) to remove all terms n~° and In “n. 
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Our algorithm first tries to apply (S2-D), if it fails to derive a canonical con- 
straint, then we apply the alternative (S1-D) to the original constraint. If both 
the strategies fails, we report failure and exit the check procedure. 


Example 11. Suppose v observes as {0.5 : n — 1,05 : n — 2},S(n) := 


Inn,t(a,n) := E, flan) := 4- g&n: lnn, H(v,n) := v. We consider 


applying both strategies to the first term expr, := n — 1 and Xi(a,n) := 
t(a,n)-(S(n) + f(a,n—1)— f(a,n)). If we apply (S1-D) to Xj, it will be approx- 
imated as exp(ln a). If we apply (S2-D) to Xj, it will be first over-approximated 

Ina 2 -> - n- lnn), then we substitute v = n — 1 


as 22. (Inn+4-—™~.-v-Inn-4. 
Inn Ina Ina 


and derive the final result exp(Ina — 4- a). Hence, both the strategies succeed. 


— Case IT: dist is uniform(n) or muniform(n). Note that H(v,n) is linear with 
respect to v, thus H(v,n) is a bijection over v for every fixed n. Hence, if v 
observes as uniform(n), then 


Elexp(t(a,n)- f(a, H(v,n)))] < SDA exp(t(a,n) + f(a, v)) (9) 


If v observes as muniform(n), a similar inequality holds by replacing 4 with 2, 
Since f(a,v) is a non-decreasing function with respect to v, we further over- 
approximate the summation in (9) by the integral ie exp(t(a,n) - f(a, v))dv. 


Example 12. Continue with Example 10, we need to check 
tlan) = mo and f(a,n) = za -n. By the inequality (9), we expand the 
constraint (8) into 2 - exp(Ina — 2-a) - ee. exp(=**). By integration, it is 


further over-approximated as 2 - exp(Ina — 2- a) - ne exp( =") dv. 


Note that we still need to resolve the integration of an exponential function 
whose exponent is a pseudo-monomial over a,n,v. Below we denote by d, the 
degree on the variable v and by £, the degree of Inv. We first list the situations 
where the integral can be computed exactly. 


— If (dv, @,) = (1,0), then the exponent could be expressed as W(a,n)-v,where 
W (a,n) is a pseudo-monomial over a and n. We can compute the integral as 
ee and over-approximate it as ee by removing —1 in 
the numerator. 

— If (dy, ,) = (0,1), then the exponent is of the form W (a,n) - Inv. We follow 
a similar procedure with the case above and obtain the over-approximation 
n ep eW an) : 

— If (dy, 4) = (0,0), then the result is trivially n - exp(W (a, n)). 


Then we handle the situation where the exact computation of the integral is 
infeasible. In this situation, the strengthening further over-approximates the 
integral into simpler forms by first replacing Inv with Inn, and then replac- 
ing v with n to reduce the degrees £, and d,. Eventually, the exponent in the 
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integral bows down to one of the three situations (where the integral can be 
computed exactly) above, and the strengthening returns the exact value of the 
integral. 


Example 13. Continue with Example 12. We express the exponent as 2a v. 
Thus, we can plug =“ into W (a,n) and obtain the integration result ae 


exp(In a) 
a 


Furthermore, we can simplify the formula in Example 12 as 


In the end, we move the term + (or Ż) that comes from the uniform (or 
muniform) distribution and the coefficient term W(a,n) into the exponent. If 
we move these terms directly, it may produce Inlnn and InIna that comes from 
taking the logarithm of Inn and Ina. Hence, we first apply ln cp < Inn < n and 
1 < Ina < a to remove all terms Inn and Ina outside the exponent (e.g., me 
is over-approximated as mg) After the over-approximation, the terms outside 
the exponentiation form a polynomial over a and n, we can trivially move these 
terms into the exponent by taking the logarithm. Finally, we apply (R4) in Case 
I to remove n~° and In “n. If we fail to obtain the canonical constraint, the 


strengthening reports failure. 


Example 14. Continue with Example 13, we move the term a into the expo- 
nentiation and simplify the over-approximation result as exp(Ina — lna) = 1. 
As a result, we over-approximate the LHS of (8) as 1 and we conclude that 
CheckCond(2, 1) holds. 


The details of the divide-and-conquer case are similar and omitted. Further- 
more, we present how to combine the strengthening results for different branches 
into a single canonical constraint. Suppose for every probabilistic branch B,, 
we have successfully obtained the canonical constraint Qzi(a,n) < 1 as the 
strengthening of the original constraint (8). Then, the canonical constraint for 
the whole distribution is ee ci- QL ala, n) < 1. Intuitively, there is probabil- 
ity c; for the branch B;, thus the combination follows by simply expanding the 
expectation term. 

A natural question is to ask whether our algorithm can always succeed to 
obtain the canonical constraint. We have the proposition as follows. 


Proposition 2. If the template for t has a lower magnitude than S(n)~+ for 
every branch, then the rewriting always succeeds. 


Proof. We first consider the single recursion case. When dist is FSDPD, we can 
apply (S1-D) to over-approximate the exponent as t(a,n) - S(n). Since t(a, n) 
has a lower magnitude than S(n)~!, by further applying (R3) to eliminate 
n~° and In “n, we obtain the canonical constraint. If dist is uniform(n) or 
muniform(n) , we observe that the over-approximation result for the integral is 
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either we (when d, > 0) or ween (when d, = 0). Thus, we can 
cancel the term f(a,7) in the exponent and obtain the canonical constraint by 


the subsequent steps. The proof is the same for the divide-and-conquer case. 


By Proposition 2, we restrict us, v+ < 0 in the template to ensure our algorithm 
never fails. 


Remark 1. Our algorithm can be extended to support piecewise uniform distri- 
butions (e.g. each of 0,...,n/2 with probability = and each of n/2+1,...,n—1 
with probability +) by handling each piece separately. 


5 Experimental Results 


In this section, we evaluated our algorithm over classical randomized algorithms 
such as QuickSort (Example 3), QuickSelect (Example 2), DiameterComputa- 
tion [26, Chapter 9], RandomizedSearch [24, Chapter 9], ChannelConflictResolu- 
tion [22, Chapter 13], examples such as Rdwalk and Rdadder in the literature [7], 
and four manually-crafted examples (MC1 — MC4). For each example, we man- 
ually compute its expected running time for the prunning. 

We implemented our algorithm in C++. 
We choose B = 2 (as the bounded range for w- i 
the template), M = 4 (in the guess proce- 
dure), Q = 8 (for the number of parts in 2” 
the integral), and prune the search space by 
Theorem 1. All results were obtained on an 
Ubuntu 18.04 machine with an 8-Core Intel 
i7-7900x Processor (4.30 GHz) and 40 GB of 


The tail bound 


RAM. 10 11 a i Es 14 15 
We report the tail bound derived by 
our algorithm in Table 1, where “Benchmark” Fig. 2. Plot for QuickSelect 


lists the benchmarks, “a-«(n*)” lists the time 

limit of interest, “Our bound” lists the tail bound by our approach, “Time(s)” lists 
the runtime (in seconds) of our approach, and “Karp’s bound” lists the bounds by 
Karp’s method. From the table, our algorithm constantly derives asymtotically 
tighter tail bounds than Karp’s method. Moreover, all these bounds are obtained 
in a few seconds, demonstrating the efficiency of our algorithm. Furthermore, our 
algorithm obtains bounds with tighter magnitude than our completeness theo- 
rem (Theorem 3) in 9 benchmarks, and bounds with the same magnitude as the 
others. 

For an intuitive comparison, we also report the concrete bounds and their 
plots of our method and Karp’s method. We choose three concrete choices of a 
and n* and plot the concrete bounds over 10 < a < 15,n* = 17. For concrete 
bounds, we also report the ratio Kates Bound to show the strength of our method. 
Due to space limitations, we only report the results for QuickSelect (Example 2) 
in Table 2 and Fig. 2. 
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Table 1. Experimental Result 


Benchmark | a+ «(n*) in (1) | Our bound Time(s) | Karp’s bound 
QuickSelect | a - n* exp(2:-a—a-Ina) 0.03 exp(1.15 — 0.28 - a) 
QuickSort a-n*-Inn* exp((4 — a) - Inn*) 0.02 exp(0.5 — 0.5- a) 
L1Diameter | a -n* exp(a — a: Ina) 0.03 exp(1.39 — 0.69 - a) 
L2Diameter | a-n* - Inn* exp(a — a- Ina) 0.03 exp(1.39 — 0.69 - a) 
RandSearch | a - Inn* exp((2 -a — a- Ina) -Inn*) | 0.03 exp(—0.29- a-Inn*) 
Channel a-n* exp((8 — a) n*) 0.05 exp(1 — 0.37 - a) 
Rdwalk a-n* exp((0.5 — a) -n*) 0.05 exp(0.60 — 0.41 - a) 
Rdadder a-n* exp((4 — 0.5- a) -n*) 0.04 Not applicable 

MC1 a-Inn* exp((a — a- Ina) - Inn*) 0.03 exp(—0.69- a- Inn*) 
MC2 a-In?n* exp((a — a- Ina) - Inn*) 0.03 exp(—0.69- a-Inn*) 
MC3 a- n* - In? n* exp(a — a- Ina) 0.03 exp(1.15 — 0.28 - a) 
MC4 a-n* exp(2-a—a- Ina) 0.04 Not applicable 


Table 2. Concrete Bounds for QuickSelect 


Concrete choice | Our bound | Karp’s Bound | Ratio 
a = 10;n* = 13 | 0.0485 0.192 3.96 
a = 11;n* = 15 | 0.0126 0.145 11.6 
a = 12;n* = 17 | 0.00297 0.110 36.9 


6 Related Work 


Karp’s Cookbook. Our approach is orthogonal to Karp’s cookbook method [21] 
since we base our approach on Markov’s inequality, and the core of Karp’s 
method is a dedicated proof for establishing that an intricate tail bound function 
is a prefixed point of the higher order operator derived from the given PRR. Fur- 
thermore, our automated approach can derive asymptotically tighter tail bounds 
than Karp’s method over all 12 PRRs in our benchmark. Our approach could 
also handle randomized preprocessing times, which is beyond the reach of Karp’s 
method. Since Karp’s proof of prefixed point is ad-hoc, it is non-trivial to extend 
his method to handle the randomized cost. Nevertheless, there are PRRs (e.g., 
Coupon-Collector) that can be handled by Karp’s method but not by ours. Thus, 
our approach provides a novel way to obtain asymptotically tighter tail bounds 
than Karp’s method. 

The recent work [30] extends Karp’s method for deriving tail bounds for 
parallel randomized algorithms. This method derives the same tail bounds as 
Karp’s method over PRRs with a single recursive call (such as QuickSelect) and 
cannot handle randomized pre-processing time. Compared with this approach, 
our approach derives tail bounds with tighter magnitude on 11/12 benchmarks. 


Custom Analysis. Custom analysis of PRRs [15,25] has successfully derived 
tight tail bounds for QuickSelect and QuickSort. Compared with the custom 
analysis that requires ad-hoc proofs, our approach is automated, has the gen- 
erality from Markov’s inequality, and is capable of deriving bounds identical or 
very close to the tail bounds from the custom analysis. 
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Probabilistic Programs. There are also relevant approaches in probabilistic 
program verification. These approaches are either based on martingale con- 
centration inequalities (for exponentially-decreasing tail bounds) [7, 10-12, 19], 
Markov’s inequality (for polynomially-decreasing tail bounds) [8,23,31], fixed- 
point synthesis [32], or weakest precondition reasoning [4,20]. Compared with 
these approaches, our approach is dedicated to PRRs (a light-weight representa- 
tion of recursive probabilistic programs) and involves specific treatment of com- 
mon recursive patterns (such as randomized pivoting and divide-and-conquer) in 
randomized algorithms, while these approaches usually do not consider common 
recursion patterns in randomized algorithms. Below we have detailed technical 
comparisons with these approaches. 


— Compared with the approaches based on martingale concentration inequali- 
ties [7,10-12,19], our approach has the same root as them, since martingale 
concentration inequalities are often proved via Markov’s inequality. However, 
those approaches have more accuracy loss since these martingale concentra- 
tion inequalities usually make further relaxations after applying Markov’s 
inequality. In contrast, our automated approach directly handles the con- 
straint after applying Markov’s inequality by having a refined treatment of 
exponentiation and hence has better accuracy in deriving tail bounds. 

— Compared with the approaches [8, 23,31] that derive polynomially-decreasing 
tail bounds, our approach targets the sharper exponentially-decreasing tail 
bounds and hence is orthogonal. 

— Compared with the fixed-point synthesis approach [32], our approach is 
orthogonal as it is based on Markov’s inequality. Note that the approach [32] 
can only handle 3/12 benchmarks. 

— Compared with weakest precondition reasoning [4,20] that requires first spec- 
ifying the bound functions and then verifying the bound functions by proof 
rules related to fixed-point conditions, mainly with manual efforts, our app- 
roach can be automated and is based on Markov’s inequality rather than 
fixed point theorems. Although Karp’s method is also based on a particular 
tail bound function as a prefixed point and can thus be embedded into the 
weakest precondition framework, Karp’s proof of prefixed point requires deep 
insight, which is beyond existing proof rules. Moreover, even a slight relax- 
ation of the tail bound function into a simpler form in Karp’s method no 
longer keeps the bound function to be a prefixed point. Hence, the approach 
of the weakest precondition may not be suitable for deriving tail bounds. 
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Abstract. We present a compositional model checking algorithm for 
Markov decision processes, in which they are composed in the categori- 
cal graphical language of string diagrams. The algorithm computes opti- 
mal expected rewards. Our theoretical development of the algorithm is 
supported by category theory, while what we call decomposition equali- 
ties for expected rewards act as a key enabler. Experimental evaluation 
demonstrates its performance advantages. 


Keywords: model checking - compositionality - Markov decision 
process - category theory - monoidal category - string diagram 


1 Introduction 


Probabilistic model checking is a topic that attracts both theoretical and practical 
interest. On the practical side, probabilistic system models can naturally accom- 
modate uncertainties inherent in many real-world systems; moreover, proba- 
bilistic model checking can give quantitative answers, enabling more fine-grained 
assessment than qualitative verification. Model checking of Markov decision pro- 
cesses (MDPs)—the target problem of this paper—has additional practical val- 
ues since it not only verifies a specification but also synthesizes an optimal control 
strategy. On the theoretical side, it is notable that probabilistic model check- 
ing has a number of efficient algorithms, despite the challenge that the problem 
involves continuous quantities (namely probabilities). See e.g. [1]. 

However, even those efficient algorithms can struggle when a model is enor- 
mous. Models can easily become enormous—the so-called state-space explosion 
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problem—due to the growing complexity of modern verification targets. Models 
that exceed the memory size of a machine for verification are common. 

Among possible countermeasures to state-space explosion, one with both 
mathematical blessings and a proven track record is compositionality. It takes as 
input a model with a compositional structure—where smaller component models 
are combined, sometimes with many layers—and processes the model in a divide- 
and-conquer manner. In particular, when there is repetition among components, 
compositional methods can exploit the repetition and reuse intermediate results, 
leading to a clear performance advantage. 

Focusing our attention to MDP model checking, there have been many com- 
positional methods proposed for various settings. One example is [14]: it stud- 
ies probabilistic automata (they are only slightly different from MDPs) and in 
particular their parallel composition; the proposed method is a compositional 
framework, in an assume-guarantee style, based on multi-objective probabilis- 
tic model checking. Here, contracts among parallel components are not always 
automatically obtained. Another example is [11], where the so-called hierarchical 
model checking method for MDPs is introduced. It deals with sequential compo- 
sition rather than parallel composition; assuming what can be called parametric 
homogeneity of components—they must be of the same shape while parame- 
ter values may vary—they present a model-checking algorithm that computes a 
guaranteed interval for the optimal expected reward. 

In this work, inspired by these works and technically building on another 
recent work of ours [20], we present another compositional MDP model check- 
ing algorithm. We compose MDPs in string diagrams—a graphical language of 
category theory [15, Chap. XI] that has found applications in computer sci- 
ence [3,8,17]—that are more sequential than parallel. Our algorithm computes 
the optimal expected reward, unlike [11]. 

One key ingredient of the algorithm is the identification of compositional- 
ity as the preservation of algebraic structures; more specifically, we identify a 
compositional solution as a “homomorphisms” of suitable monoidal categories. 
This identification guided us in our development, explicating requirements of a 
desired compositional semantic domain (Sect. 2). 

Another key ingredient is a couple of decomposition equalities for reachabil- 
ity probabilities, extended to expected rewards (Sect. 3). Those for reachability 
probabilities are well-known—one of them is Girard’s execution formula [7] in 
linear logic—but our extension to expected rewards seems new. 

The last two key ingredients are combined in Sect. 4 to formulate a composi- 
tional solution. Here we benefit from general categorical constructions, namely 
the Int construction [10] and change of base [5,6]. 

We implemented the algorithm (it is called CompMDP) and present its exper- 
imental evaluation. Using the benchmarks inspired by real-world problems, we 
show that 1) CompMDP can solve huge models in realistic time (e.g. 10° posi- 
tions, in 6-130s); 2) compositionality does boost performance (in some ablation 
experiments); and 3) the choice of the degree of compositionality is important. 
The last is enabled in CompMDP by the operator we call freeze. 
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E A room A; °™ com- 
(a) A task Af***.  bines tasks. (c) A floor A#°** combines rooms. 


(e) A neighborhood A®?* combines 
(d) A building A?" combines floors. buildings. 


Fig. 1. String diagrams of MDPs, an example (the Patrol benchmark in Sect. 5). 


ZAŁ; JBE = JAIB? 
SAK 
Sapedse = 
f B 
( ) (>) 
SA = Lo ae 


Fig. 2. Sequential composition ;, sum @, and loops of MDPs, illustrated. 


Compositional Description of MDPs by String Diagrams. The calculus 
we use for composing MDPs is that of string diagrams. Figure 1 shows an example 
used in experiments. String diagrams offer two basic composition operations, 
sequential composition ; and sum @, illustrated in Fig. 2. The rearrangement of 
wires in A@B is for bundling up wires of the same direction. It is not essential. 

We note that loops in MDPs can be described using these algebraic opera- 
tions, as shown in Fig. 2. We extend MDPs with open ends so that they allow 
such composition; they are called open MDPs. 

The formalism of string diagrams originates from category theory, specifically 
from the theory of monoidal categories (see e.g. [15, Chap. XI|). Capturing the 
mathematical essence of the algebraic structure of arrow composition o and ten- 
sor product ®—they correspond to ; and @ in this work, respectively—monoidal 
categories and string diagrams have found their application in a wide variety of 


Compositional Probabilistic Model Checking with String Diagrams of MDPs 43 


scientific disciplines, such as quantum field theory [12], quantum mechanics and 
computation [8], linguistics [17], signal flow diagrams [3], and so on. 

Our reason for using string diagrams to compose MDPs is twofold. Firstly, 
string diagrams offer a rich metatheory—developed over the years together with 
its various applications—that we can readily exploit. Specifically, the theory cov- 
ers functors, which are (structure-preserving) homomorphisms between monoidal 
categories. We introduce a solution functor S: oMDP — S from a category 
oMDP of open MDPs to a semantic category S that consists of solutions. We 
show that the functor S preserves two composition operations, that is, 


S(A;B)=S(A);S(B), S(AGB)=S(A) SS(B), (1) 


where ; and @ on the right-hand sides are semantic composition operations on 
S. The equalities (1) are nothing but compositionality: the solution of the whole 
(on the left) is computed from the solutions of its parts (on the right). 

The second reason for using string diagrams is that they offer an expressive lan- 
guage for composing MDPs—one that enables an efficient description of a number 
of realistic system models—as we demonstrate with benchmarks in Sect. 5. 


Granularity of Semantics: A Challenge Towards Compositionality Now 
the main technical challenge is the design of a semantic domain S (it is a category 
in our framework). We shall call it the challenge of granularity of semantics; it 
is encountered generally when one aims at compositional solutions. 


— The coarsest candidate for S is the original semantic domain; it consists of 
solutions and nothing else. This coarsest candidate is not enough most of the 
time: when components are composed, they may interact with each other via 
a richer interface than mere solutions. (Consider a team of two people. Its 
performance is usually not the sum of each member’s, since there are other 
affecting factors such as work style, personal character, etc.) 

— Therefore one would need to use a finer-grained semantic domain as S, which, 
however, comes with a computational cost: in (1), one will have to carry 
around bigger data as intermediate solutions S(A) and S(B); their semantic 
composition will become more costly, too. 


Therefore, in choosing S, one should find the smallest enrichment! of the original 
semantic domain that addresses all relevant interactions between components 
and thus enables compositional solutions. This is a theoretical challenge. 

In this work, following our recent work [20] that pursued a compositional 
solution of parity games, we use category theory as guidance in tackling the 
above challenge. Our goal is to obtain a solution functor S: oMDP — S that 
preserves suitable algebraic structures (see (1)); the specific notion of algebra of 
our interest is that of compact closed categories (compCC). 


— The category oMDP organizes open MDPs as a category. It is a compCC, 
and its algebraic operations are defined as in Fig. 2. 


1 Enrichment here is in the natural language sense; it has nothing to do with the 
technical notion of enriched category. 
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— For the solution functor S to be compositional, the semantic category S must 
itself be a compCC, that is, S has to be enriched so that the compCC opera- 
tions (; and @) are well-defined. 

— Once such a semantic domain S is obtained, choosing S and showing that it 
preserves the algebraic operations are straightforward. 


Specifically, we find that S must be enriched with reachability probabilities, in 
addition to the desired solutions (namely expected rewards), to be a compCC. 
This enrichment is based on the decomposition equalities we observe in Sect. 3. 
After all, our semantic category S is as follows: 1) an object is a pair of natural 
numbers describing an interface (how many entrances and exits); 2) an arrow is 
a collection of “semantics,” collected over all possible (memoryless) schedulers 7, 
which records the expected reward that the scheduler 7 yields when it traverses 
from each entrance to each exit. The last “semantics” is enriched so that it 
records the reachability probability, too, for the sake of compositionality. 


Related Work. Compositional model checking is studied e.g. in [4,19,20]. 
Besides, probabilistic model checking is an actively studied topic; see [1, 
Chap. 10] for a comprehensive account. We shall make a detailed comparison 
with the works [11,14] that study compositional probabilistic model checking. 

The work [14] introduces an assume-guarantee reasoning framework for par- 
allel composition ||, as we already discussed. Parallel composition is out of our 
current scope; in fact, we believe that compositionality with respect to || requires 
a much bigger enrichment of a semantic domain S than mere reachability prob- 
abilities as in our work. The work [14] is remarkable in that its solution to this 
granularity problem—namely by assume-guarantee reasoning—is practically sen- 
sible (domain experts often have ideas about what contract to impose) and comes 
with automata-theoretic automation. That said, such contracts are not always 
automatically synthesized in [14], while our algorithm is fully automatic. 

The work [11] is probably the closest to ours in the type of composition 
(sequential rather than parallel) and automation. However, the technical bases 
of the two works are quite different: theirs is the theory of parametric MDPs [18], 
which is why their emphasis is on parametrized components and interval solu- 
tions; ours is monoidal categories and some decomposition equalities (Sect. 3). 

We note that the work [11] and ours are not strictly comparable. On the 
one hand, we do not need a crucial assumption in [11], namely that a locally 
optimal scheduler in each component is part of a globally optimal scheduler. The 
assumption limits the applicability of {11]—it practically forces each component 
to have only one exit. The assumption does not hold in our benchmarks Patrol 
and Wholesale (see Sect.5). Our algorithm does not need the assumption since 
it collects the semantics of all relevant memoryless schedulers. 

On the other hand, unlike [11], our algorithm is not parametric, so it cannot 
exploit the similarity of components if they only differ in parameter values. Note 
that the target problems are different, too (interval [11] vs. exact here). 


Notations. For natural numbers m and n, we let [m,n] := {m,m+1,...,n— 
1, n}; as a special case, we let [m] := {1,2,...,m} (we let [0] = @ by convention). 
The disjoint union of two sets X,Y is denoted by X +Y. 
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idi i S S:=Int(S; 
bidirectional, MDPs — MDP = Int(roMDP) ŻE, Int(S,) =: S 
the Tnit (compact closed) 
consi, unidirectional, MDPs 
: roMDP ———_———> S, 
(traced monoidal) Sy 
change 
BEDE unidirectional, MCs roMC gMc 
i — > 
(traced monoidal) gMC = 


Fig. 3. Categories of MDPs/MCs, semantic categories, and solution functors. 


2 String Diagrams of MDPs 


We introduce our calculus for composing MDPs, namely string diagrams of 
MDPs. Our formal definition is via their unidirectional and Markov chain (MC) 
restrictions. This apparent detour simplifies the theoretical development, allow- 
ing us to exploit the existing categorical infrastructure on (monoidal) categories. 


2.1 Outline 


We first make an overview of our technical development. Although we use some 
categorical terminologies, prior knowledge of them is not needed in this outline. 

Figure 3 is an overview of relevant categories and functors. The verification 
targets—open MDPs—are arrows in the compact closed category (compCC) 
oMDP. The operations ;,@ of compCCs compose MDPs, as shown in Fig. 2. 
Our semantic category is denoted by S, and our goal is to define a solution 
functor OMDP — S that is compositional. Mathematically, such a functor with 
the desired compositionality (cf. (1)) is called a compact closed functor. 

Since its direct definition is tedious, our strategy is to obtain it from a uni- 
directional rightward framework Sp: roMDP — Sr, which canonically induces 
the desired bidirectional framework via the celebrated Int construction [10]. In 
particular, the category oMDP is defined by OoMDP = Int(roMDP); so are 
the semantic category and the solution functor (S = Int(S,),S = Int(S,)). 

Going this way, a complication that one would encounter in a direct defini- 
tion of OoMDP (namely potential loops of transitions) is nicely taken care of by 
the Int construction. Another benefit is that some natural equational axioms in 
oMDP-—such as the associativity of sequential composition ;—follow automat- 
ically from those in roOMDP, which are much easier to verify. 

Mathematically, the unidirectional framework S,: roMDP — S, consists of 
traced symmetric monoidal categories (TSMCs) and traced symmetric monoidal 
functors; these are “algebras” of unidirectional graphs. The Int construction 
turns TSMCs into compCCs, which are “algebras” of bidirectional graphs. 

Yet another restriction is given by (rightward open) Markov chains (MCs). 
See the bottom row of Fig. 3. This MDP-to-MC restriction greatly simplifies our 
semantic development, freeing us from the bookkeeping of different schedulers. 
In fact, we can introduce (optimal memoryless) schedulers systematically by 


46 K. Watanabe et al. 


the categorical construction called change of base [5,6]; this way we obtain the 
semantic category S, from SMC, 


2.2 Open MDPs 


We first introduce open MDPs; they have open 

ends via which they compose. They come with m_.sf— ram Pr 

a notion of arity—the numbers of open ends on m ` + = m (2) 
their left and right, distinguishing leftward and 
rightward ones. For example, the one on the right 
is from (2,1) to (1,3). 


Definition 2.1 (open MDP (oMDP)). Let A be a non-empty finite set, 
whose elements are called actions. An open MDP A (over the action set A) is 
the tuple (m,n, Q, A, E, P, R) of the following data. We say that it is from m to 
n. 


1. Mm = (mp, Mı) and Nn = (n,, n) are pairs of natural numbers; they are called 
the left-arity and the right-arity, respectively. Moreover (see (2)), elements 
of |m, + n] are called entrances, and those of [n, + m] are called exits. 

2. Q is a finite set of positions. 

3. E: [m +n] > Q+ [n +m] is an entry function, which maps each entrance 
to either a position (in Q) or an exit (in [n, + m]). 

4. P:QxAx(Q+ [n.+m]) > R>o determines transition probabilities, where 
we require sreg, +m] P(s,a,s') € {0,1} for each s E€ Q and a € A. 

5. R is a reward function R : Q > R>o. 

6. We impose the following “unique access to each exit” condition. Let exits : 
([m,+n]+Q) —> P([n,+m]) be the exit function that collects all immediately 
reachable exits, that is, 1) for each s € Q, exits(s) = {t € [n, + m]|da € 
A.P(s,a,t) > 0}, and 2) for each entrance s € |m, +n], exits(s) = {E(s)} 
if E(s) is an exit and exits(s) = @ otherwise. 

- For all s,s’ € |m, + n] +Q, if exits(s) N exits(s’) 40, then s = 8’. 

— We further require that each exit is reached from an identical position 
by at most one action. That is, for each exit t € |n, + m], s E€ Q, and 
a,b E€ A, if both P(s,a,t) > 0 and P(s,b,t) > 0, then a =b. 


Note that the unique access to each exit condition is for technical convenience; 
this can be easily enforced by adding an extra “access” position to an exit. 

We define the semantics of open MDPs, which is essentially the standard 
semantics of MDPs given by expected cumulative rewards. In this paper, it 
suffices to consider memoryless schedulers (see Remark 2.1). 


Definition 2.2 (path and scheduler). Let A = (m,n, Q, A, E, P, R) be an 
open MDP. A (finite) path x) in A from an entrance i € |m, +n] to an exit 
j € [ny +m] is a finite sequence i, s1,..., Sn, j such that E(t) = sı and for all 
k € [n], sk E€ Q. For each k € [n], ro) denotes Sk, and nid) denotes j. The 
set of all paths in A from i to j is denoted by Path} (i, j). 

A (memoryless) scheduler 7T of A is a function T : Q > A. 
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Remark 2.1. It is well-known (as hinted in [2]) that we can restrict to memoryless 
schedulers for optimal expected rewards, assuming that the MDP in question is 
almost surely terminating under any scheduler (t). We require the assumption 
(f) in our compositional framework, too, and it is true in all benchmarks in this 
paper. The assumption (f) must be checked only for the top-level (composed) 
MDP; (f) for its components can then be deduced. 


Definition 2.3 (probability and reward of a path). Let A = (m,7,Q, A, 
E,P,R) be an open MDP, T : Q — A be a scheduler of A, and n3) b 
path in A. The probability Pr47(x@)) of x) under r is PA’? (x! 
Ii P( a? (nb), aH) ). The reward Rw4 (2%) along the path r%I) is 
the sum of the position rewards, that is, RwA(m%3)) := S keii R(n&), 


Our target problem on open MDPs is to compute the expected cumulative 
reward collected in a passage from a specified entrance 7 to a specified exit 7. This 
is defined below, together with reachability probability, in the usual manner. 


Definition 2.4 (reachability probability and expected (cumulative) 
reward of open MDPs). Let A be an open MDP and T be a scheduler, as 
in Definition 2.2. Let i be an entrance and j be an exit. 

The reachability probability RPr+7 (i, j) fromi to j, in A under T, is defined 
by RPA" (i j) = Dogan epuena(agy PPA” OO. 

The expected (cumulative) reward ERw^7 (i, j) from i to j, in A under r, 
is defined by ERw*"7 (i, j) := Dra EPathA (i,j) Prt? (7G). Rw4 (23). Note 
that the infinite sum here always converges to a finite value; this is because there 
are only finitely many positions in A. See e.g. [1]. 


Remark 2.2. In standard definitions such as Definition 2.4, it is common to 
either 1) assume RPr#7(i,7) = 1 for technical convenience [11], or 2) allow 
RPr** (i,j) < 1, but in that case define ERw“'7 (i, j) := oo [1]. These defini- 
tions are not suited for our purpose (and for compositional model checking in 
general), since we take into account multiple exits, to each of which the reach- 
ability probability is typically < 1, and we need non-co expected rewards over 
those exits for compositionality. Note that our definition of expected reward is 
not conditional (unlike [1, Rem. 10.74]): when the reachability probability from i 
to j is small, it makes the expected reward small as well. Our notion of expected 
reward can be thought of as a “weighted sum” of rewards. 


2.3 Rightward Open MDPs and Traced Monoidal String Diagrams 


Following the outline (Sect. 2.1), in this section we focus on (unidirectional) right- 
ward open MDPs and introduce the “algebra” roMDP of them. The operations 
;,@, tr of traced symmetric monoidal categories (TSMCs) compose rightward 
open MDPs in string diagrams. 
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= 


mv on 


(A:l+m>l+n) 


> (trim, n(A) :m —> n), asin 


l l 
+ > 
mj A |n, 


Fig. 4. The trace operator. 


Definition 2.5 (rightward open MDP (roMDP)). An open MDP A = 
(m, n,Q, A, E, P, R) is rightward if all its entrances are on the left and all its 
exits are on the right, that is, m = (m,,0,) and T = (n,,0,) for some m, and 
n, We write A = (Mma, Nne, Q, A, E, P, R), dropping 0 from the arities. 

We say that a rightward open MDP A is from m to n, writing A: m —> n, 
if it is from (m,0) to (n,0) as an open MDP. 


We use an equivalence relation by roMDP isomorphism so that roMDPs 
satisfy TSMC axioms given in Sect. 2.4. See [21, Appendix A] for details. 

We move on to introduce algebraic operations for composing rightward open 
MDPs. Two of them, namely sequential composition ; and sum ®, look like Fig. 2 
except that all wires are rightward. The other major operation is the trace oper- 
ator tr that realizes (unidirectional) loops, as illustrated in Fig. 4. 


Definition 2.6 (sequential composition ; of roMDPs). Let A: m —> k 
and B: k — n be rightward open MDPs with the same action set A and with 
matching arities. Their sequential composition A;B: m —> n is given by A;B := 
(m,n, Q^ + QË, A, E48, PAB (RA, R8]), where 


- EAB) := BA(i) if EA(i) € Q^, and E*8 (i) := EB (E4A(i)) otherwise (if 
the A-entrance i goes to an A-exit which is identified with a B-entrance); 
— the transition probabilities are defined in the following natural manner 


per a s’) _ PAGA, a, s’) ifs’ € Q^, 

dua Žie] P^ (s^ a,i) pBz Otherwise (i.e. 3’ € Q® + [n]), 
PA (68 ‘ 3’) = P? (s8, a, s’) ifs” E€ QË + În], 

a 0 otherwise, 


where 6 is a characteristic function (returning 1 if the condition is true); 
- and [RA, R8]: Q4+Q8 > Rso combines RA, RË by case distinction. 


Defining sum © of roMDPs is straightforward, following Fig.2. See [21, 
Appendix A] for details. 

The trace operator tr is primitive in the TSMC roMDP; it is crucial in 
defining bidirectional sequential composition shown in Fig. 2 (cf. Definition 2.9). 


Definition 2.7 (the trace operator tr).,,,,, over roMDPs). Let A:l+m— 
l+n be a rightward open MDP. The trace trim ,(A):m— n of A with respect 
to l is the roMDP trim n(A) = (m,n, Q^, A, E, P, R^) (cf. Fig. 4), where 
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- The entry function E is defined naturally, using a sequence io,...,ik-1 of 
intermediate open ends (in |l]) until reaching a destination ix. 
Precisely, we let ig := i+ l andi; = EA(ij;-1) for each j. We let k to be 
the first index at which i, comes out of the loop, that is, 1) i; € [L] for each 
j € [k—1], and 2) ip € [14+1,l+n]+Q4. Then we define E(i) by the following: 
E(t) := ik — l if ip € [14+ 1,14 n]; and E(t) := ip otherwise. 

— The transition probabilities P are defined as follows. We let prec(t) be the set 
of open ends in [l|—those which are in the loop—that eventually enter A at 
t € [1+1,n] + Q^. Precisely, prec(t) := {i € [I] | Jio, ...,ik-io = i, ij+1 = 
E(i;) (for each j), ix = t, io,- -.,ik-1 € [1,0], ik € [1+1,n] + Q4}. Using this, 


>. 


P4(q,a, d +1) + Yieprectg iy P^ a,i) if d € fn, 
PA(q,a,¢') + Vicprec(q) PA(q, a,i) otherwise, i.e. if q! € Q^. 


P(q,a,q'):= [ 


Here Q^ and |I] are assumed to be disjoint without loss of generality. 


Remark 2.3. In string diagrams, it is common to annotate a wire with its type, 
such as —> for idp: n —> n. It is also common to separate a wire for a sum type 
into wires of its component types, such as below on the left. Therefore the two 
diagrams below on the right designate the same mathematical entity. Note that, 
on its right-hand side, the type annotation 1 to each wire is omitted. 


m 
min > 3 2 2. > > 
ee n + A JA 
> 


2.4 TSMC Equations Between roMDPs 


Here we show that the three operations ;, @, tr on roMDPs satisfy the equational 
axioms of TSMCs [10], shown in Fig. 5. These equational axioms are not directly 
needed for compositional model checking. We nevertheless study them because 
1) they validate some natural bookkeeping equivalences of roMDPs needed for 
their efficient handling, and 2) they act as a sanity check of the mathematical 
authenticity of our compositional framework. For example, the handling of open 
ends is subtle in Sect. 2.3—e.g. whether they should be positions or not—and 
the TSMC equational axioms led us to our current definitions. 

The TSMC axioms use some “positionless” roMDPs as wires, such as identi- 
ties Im (—“~ in string diagrams) and swaps Sm,n (x). See [21, Appendix A] for 
details. The proof of the following is routine. For details, see [21, Appendix B]. 


Theorem 2.1. The three operations ;,@,tr on roMDPs, defined in Sect. 2.3, 
satisfy the equational axioms in Fig. 5 up-to isomorphisms (see [21, Appendix A] 
for details). 


Corollary 2.1 (a TSMC roMDP). LetroMDP be the category whose objects 
are natural numbers and whose arrows are roMDPs over the action set A modulo 
isomorphisms. Then the operations ;,®@,tr,Z,S make roMDP a traced symmet- 
ric monoidal category (TSMC). 
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(;-Unit) Imi A= ASA; Ia (Vanishing1) tro;m,m(Zm) = Im 
(;-Assoc) A;(B;C) =(A;B);C (Vanishing2) (see below) 
(@-Assoc) (A@®B)GC=AG(BEC) (Superposing) (see below) 
(Bifuncl) Im O In = Imin (Yanking) tln;m,m(Sm,m) = Im 
(Bifunc2) (A @B);(C@D) =(A;C) 6 (B;D) (Naturality1) (see below) 
(Naturality2) trim n(A; (Zi @ B)) 

(Swap1) Sm,o = Im = trim k (4); B 
(Swap2) Si,m+n = (Sıi,m ® In) ; (Zm ® Si,n) (Dinaturality) (see below) 
(Swap3) m,n j Ən,m = Lm+n 

(Vanishing2) (Superposing) 

l l 
lı +l2 Te A = T A 1 

= — mi ny ma EN 

m| A ny aa m A n mjg ma B ut 

(Naturality1) (Dinaturality) 

l l ( l ( k 
— ymn ( , 
= BHH j LIB 
m B k A n m B k J A Mi m y A ny a m A n > 


Fig. 5. The equational axioms of TSMCs, expressed for roMDPs, with some string 
diagram illustrations. Here we omit types of roMDPs; see [10] for details. 


2.5 Open MDPs and “Compact Closed” String Diagrams 


Following the outline in Sect.2.1, we now introduce a bidirectional “compact 
closed” calculus of open MDPs (oMDPs), using the Int construction [10] that 
turns TSMCs in general into compact closed categories (compCCs). 

The following definition simply says oMDP := Int(roMDP), although it 
uses concrete terms adapted to the current context. 


Definition 2.8 (the category oMDP). The category oMDP of open MDPs 
is defined as follows. Its objects are pairs (m,,™m,) of natural numbers. Its arrows 
are defined by rightward open MDPs as follows: 


— (n,,m) in OoMDP 


an arrow A: M, +m — Nn, +m, in roMDP, i.e. an roMDP 


an arrow (m,,™) 


(3) 


where the double lines == mean “is the same thing as.” 


The definition may not immediately justify its name: no open MDPs appear 
there; only roMDPs do. The point is that we identify the roMDP A in (3) 
with the oMDP W(A) of the designated type, using “twists” in Fig.6. See [21, 
Appendix A] for details. 

We move on to describe algebraic operations for composing oMDPs. These 
operations come from the structure of oMDP as a compCC; the latter, in turn, 
arises canonically from the Int construction. 


Definition 2.9 (; of OoMDPs). Let A: (m,,m,) > (lh) and B: (lp, 4) > 
(np nı) be arrows in OMDP with the same action set A. Their sequential com- 
position A; B : (m p, mı) —> (np, n) is defined by the string diagram in Fig. 7, 
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p Ww 
=A? S Aa m Co ol AL 


ny m m m 


v 


Fig. 6. Turning oMDPs to roMDPs, and vice versa, via twists. 
Ga y l < Ty 9 ke n ley 
i A |m Bl, n A |m B |a 


Fig. 7. String diagrams in roMDP for A; B, A@ B in oMDP. 


formulated in roMDP. Teztually the definition is A; B := Ory ae 
(Gian ® Tp) (A ® TIn,) ; (Zi, © Sm, n,) ; (B P Im,) ; (Sn, ®Im,))- 


Np FM] 


The definition of sum ® of oMDPs is similarly shown in the string diagram 
in Fig. 7, formulated in roMDP. Definition of “wires” such as identities, swaps, 
units (C in string diagrams) and counits (D) is easy, too. 


Theorem 2.2 (oOMDP is acompCC). The category OMDP (Definition 2.8), 
equipped with the operations ;,®, is a compCC. 


3 Decomposition Equalities for Open Markov Chains 


Here we exhibit some basic equalities that decompose the behavior of (rightward 
open) Markov chains. We start with such equalities on reachability probabilities 
(which are widely known) and extend them to equalities on expected rewards 
(which seem less known). Notably, the latter equalities involve not only expected 
rewards but also reachability probabilities. 

Here we focus on rightward open Markov chains (roMCs), since the extension 
to richer settings is taken care of by categorical constructions. See Fig. 3. 


Definition 3.1 (roMC). A rightward open Markov chain (roMC) C from m 
to n is an roMDP from m to n over the singleton action set {x}. 

For an roMC C, its reachability probability RPr° (i, j) and expected reward 
ERw° (i,j) are defined as in Definition 2.4. The scheduler T is omitted since it 
is unique. 

Rightward open MCs, as a special case of roMDPs, form a TSMC (Corol- 
lary 2.1). It is denoted by roMC. 


The following equalities are well-known, although they are not stated in terms 
of open MCs. Recall that RPr°(i,k) is the probability of reaching the exit k 
from the entrance i in C (Definition 2.4). Recall also the definitions of C ; D 
(Definition 2.6) and tr).,,,,(€) (Definition 2.7), which are essentially as in Fig. 2 
and Fig. 4. 
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Proposition 3.1 (decomposition equalities for RPr). LetC:m-—1, D: 
l— n andE:l+m—l+n be roMCs. The following matrix equalities hold. 


Ree ees = [RPr (i, k) Jiem rem OL PE (k 3) lre jen > (4) 


[RPr ima Oi, i) ] ejm jen] = [RPrE (0+ il+ 5)]; n) t eee: 
(5) 


Here [Ree (i P) | setenl aa denotes them x n matrix with the designated com- 


Elm], jel 


ponents; other matrices are similar. The matrices A, B,C are given by A := 
„E : = E e E 

[RP (0+ i, k) Jiem reu B = [RP (k, k) Jem wep dC = [RPr (k, 1+ 

j) ] k'eli] jel] In the last line, note that the matrix in the middle is the d-th power. 


The first equality is easy, distinguishing cases on the intermediate open end 
k (mutually exclusive since MCs are rightward). The second says 


d times 


JE GP, i ESS j 2 d Esme oek] 


which is intuitive. Here, the small circles in the diagram correspond to dead 
ends. It is known as Girard’s execution formula [7] in linear logic. 
We now extend Prop. 3.1 to expected rewards ERw° (i, j). 


Proposition 3.2 (decomposition eq. for ERw). LetC:m—1,D:l—on 
and E:l+m —l+n be roMCs. The following equalities of matrices hold. 


[ERw®? (i) wis jeln] = [RPr° (i, k) | sem, kell |; [ERw? (k, D lkeiizeri (6) 
+[ERw° (i, Dense ke [I] -[RPr? (k, D Licata gana ’ 
[ERw" umn O Ci, j) | ea cand = [ERw* Uil +I) liem jeiny + Zaen A’ B+ C. 
(7) 
Here A, B,C are the following m x 2lx2lx2lx2lxn matrices. 
E : E ; 
A = ([RP1F (l + i, k) Jiem rep [ERW (+ å, Alere 


i E 
[RPr (k, k) | euken [ ERw’* (k, k”) Jey, vel, 


E [0 ke [I] ,k’ ell] [RPr" (k, k’) | ne 


d- [ERwf(k',1 + j) | 


a 7 dk! set 
[RPrf (k 49) lwen sen] 


ell k'el] 


Proposition 3.2 seems new, although proving them is not hard once the state- 
ments are given (see [21, Appendix C] for details). They enable one to compute 
the expected rewards of composite roMCs C ; D and tr; m né from those of com- 
ponent roMCs C, D,E. They also signify the role of reachability probabilities in 


Compositional Probabilistic Model Checking with String Diagrams of MDPs 53 


such computation, suggesting their use in the definition of semantic categories 
(cf. granularity of semantics in Sect. 1). 

The last equalities in Propositions 3.1 and 3.2 involve infinite sums gen: 
and one may wonder how to compute them. A key is their characterization as 
least fixed points via the Kleene theorem: the desired quantity on the left side 
(RPr or ERw) is a solution of a suitable linear equation; see Proposition 3.3. 
With the given definitions, the proof of Propositions 3.1 and 3.2 is (lengthy but) 
routine work (see e.g. [1, Thm. 10.15]). 


Proposition 3.3 (linear equation characterization for (5) and (7)). Let 
E:l+m—>l+n be an roMC, and k € [1+ 1,1 + n] be a specified exit of E. 
Consider the following linear equation on an unknown vector [xi)ief+m): 


[25] seterm) = [RE (i,k) i as BPP) kepes f [23] je: (8) 


Consider the least solution |Xi]ief4mj of the equation. Then its part |Zitiliefm| 
is given by the vector (RPr smn lE) (i, k— 1)) se fom] of suitable reachability proba- 
bilities. 

Moreover, consider the following linear equation on an unknown [yilie(i+m)* 


[vi retten] E [ERwf (i, k) | setts] ag [ERwf (3) seam) gett f E leew (9) 
+ [RPr* (i,j) | 


ie[l+m], jeft] ` [vs ljem 


where the unknown |x;|;e(r is shared with (8). Consider the least solution 
[Yilictitm Of the equation. Then its part [Ji+ilicjļm] is given by the vector of 

; S ma 
suitable expected rewards, that is, [§i+ilie(m] = (ERwin (i, k— 1)) icm] 

We can modify the linear Eqs. (8,9)—removing unreachable positions, 
specifically—so that they have unique solutions without changing the least ones. 
One can then solve these linear equations to compute the reachabilities and 
expected rewards in (5,7). This is a well-known technique for computing reach- 
ability probabilities [1, Thm. 10.19]; it is not hard to confirm the correctness of 
our current extension to expected rewards. 


4 Semantic Categories and Solution Functors 


We build on the decomposition equalities (Proposition 3.2) and define the seman- 
tic category S for compositional model checking. This is the main construct in 
our framework. Our definitions proceed in three steps, from roMCs to roMDPs to 
oMDPs (Fig. 3). The gaps between them are filled in using general constructions 
from category theory. 


4.1 Semantic Category for Rightward Open MCs 


We first define the semantic category SMC for roMCs (Fig. 3, bottom right). 
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Definition 4.1 (objects and arrows of SMC). The category SMC has natural 
numbers m as objects. Its arrow f: m — n is given by an assignment, for each 
pair (i,j) of i € [m] andj € |n], of a pair (Pi j, Ti j) of nonnegative real numbers. 
There pairs (pi j, Ti j) are subject to the following conditions. 


- (Subnormality) Y ejn] Pij <1 for each i € [m]. 


— (Realizability) pij = 0 implies r; j = 0. 


An illustration is in Fig. 8. For an object m, each =2——————+?2 
Pirr) ey 


i € [m] is identified with an open end, much like in — 
roMC and roMDP. For an arrow f: m — n, the a 
pair f(i, j) = (Pij, Tij) encodes a reachability proba- __ |. 
bility and an expected reward, from an open end i to j; oah) 
together they represent a possible roMC behavior. 

We go on to define the algebraic operations of SMC 
as a TSMC. While there is a categorical description of 


SMC using a monad [16], we prefer a concrete definition here. See [21, Appendix 
sMo, 


Fig. 8. An arrow 
f:2—> 2 in SMS. 


D] for the categorical definition of 


Definition 4.2 (sequential composition ; of SMC). Let f: m > l and g: l> 
n be arrows in SMC. Their sequential composition f; g: m —> n of f and g 


is defined as follows: letting f(i, j) = @t art) and g(i,j) = (pł; r};), then 


fig = (Pia rlia ena is given by 


f; Pa 
[oF leier = E ` [ks lenient 
[ fig 


Tij Vefaat = [ea epen ue 


f g 
slreiiseti + [rin lepve i [P lrei ee 
The sum © and the trace operator tr of SMC are defined similarly. To define 
and prove axioms of the trace operator (Fig. 5), we exploit the categorical theory 


of strong unique decomposition categories [9]. See [21, Appendix D]. 


Definition 4.3 (SM° as a TSMC). SMO is a TSMC, with its operations 
;, ®, tr. 


Once we expand the above definitions to concrete terms, it is evident that 
they mirror the decomposition equalities. Indeed, the sequential composition ; 
mirrors the first equalities in Propositions 3.1 and 3.2. The same holds for the 
trace operator, too. Therefore, one can think of the above categorical develop- 
ment in Definition 4.2 and Definition 4.3 as a structured lifting of the (local) 
equalities in Propositions 3.1 and 3.2 to the (global) categorical structures, as 
shown in Fig. 3. 

Once we found the semantic domain SMO, the following definition is easy. 


Definition 4.4 (SM°). The solution functor SMS: roMC — SMC is defined 
as follows. It carries an object m (a natural number) to the same m; it carries 
an arrow C:m—n in roMC to the arrow SM°(C): m — n in SMC, defined by 


Sp '(C)(i, J) := (RPr(i, j), ERW(i, j) ), (10) 
using reachability probabilities and expected rewards (Definition 2.4). 
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Theorem 4.1 (SMF is compositional). The correspondence SMC, defined 
in (10), is a traced symmetric monoidal functor. That is, SM°(C ; D) = SMC (C) ; 
SMO(D), SMO(C@D) = SMO (C) BSMO(D), and SM@(tr(E)) = tr(SM°(E)). Here 
;, ®, tr on the left are from Sect. 2.3; those on the right are from Definition 4.3. 


4.2 Semantic Category of Rightward Open MDPs 


We extend the theory in Sect.4.1 from MCs to MDPs (Fig.3). In particular, 
on the semantics side, we have to bundle up all possible behaviors of an MDP 
under different schedulers. We find that this is done systematically by change of 
base [5,6]. We use the following notation for fixing scheduler 7. 


Definition 4.5 (roMC MC(4,r) induced by A,r). Let A: m —> n be a 
rightward open MDP and T : Q^ — A be a memoryless scheduler. The rightward 
open MC MC(A,7) induced by A and 7 is (m,n, Q4, {x}, E4, PMCM?) RA), 
where for each s € Q and t € ([n, +m] +Q), PMO) (s, x,t) := PA(s,7(s),t). 


Much like in Sect. 4.1, we first describe the semantic category S, in concrete 
terms. We later use the categorical machinery to define its algebraic structure. 


Definition 4.6 (objects and arrows of S,). The category Sp has natu- 
ral numbers m as objects. Its arrow F: m — n is given by a set {fi: m > 
n inSM°} <7 of arrows of the same type in SMC (I is an arbitrary index set). 


The above definition of arrows—collecting arrows in SMC, each of which cor- 
responds to the behavior of MC(.A, T) for each r—follows from the change of base 
construction (specifically with the powerset functor P on the category Set of sets). 
Its general theory gives sequential composition ; for free (concretely described 
in Definition 4.7), together with equational axioms. See [21, Appendix D]. Sum 
® and trace tr are not covered by general theory, but we can define them analo- 
gously to ; in the current setting. Thus, for © and tr as well, we are using change 
of base as an inspiration. 

Here is a concrete description of algebraic operations. It applies the corre- 
sponding operation of SMC in the elementwise manner. 


Definition 4.7 (;,6,tr in S,). Let F:m—-1,G:lon, H:l+m—l4n 
be arrows in Sp. Their sequential composition F ; G of F and G is given by 
F;G:={f;9|f¢F, g€ G} where f ;g is the sequential composition of f 
and g in SMC. The trace tl jmn(H) :m — n of H with respect to | is given by 
ttim n (H) = {pm n(h) |h E€ H} where try.) ,(h) is the trace of h with respect 
tol in SMC, 

Sum © in S, is defined analogously, applying the operation in SME element- 
wise. See [21, Appendix A] for details. 


Theorem 4.2. S, is a TSMC. 


We now define a solution functor and prove its compositionality. 


56 K. Watanabe et al. 


Definition 4.8 (S+). The solution functor Sp: roMDP — S, is defined as 
follows. It carries an object m € N to m, and an arrow A: m —> n inroMDP to 
S,(A): m— n in Sp. The latter is defined in the following elementwise manner, 
using SMS in Definition 4.4. 


S,(A) := {SM°(MC(A, T)) |T : Q^ — A a (memoryless) scheduler}. (11) 


Theorem 4.3 (compositionality). The correspondence Sy: roMDP — Sr is 
a traced symmetric monoidal functor, preserving ;,®,tr as in Thm. 4.1. 


Remark 4.1 (memoryless schedulers). Our restriction to memoryless schedulers 
(cf. Definition 2.2) plays a crucial role in the proof of Theorem 4.3, specifically 
for the trace operator (i.e. loops, cf. Fig. 4). Intuitively, a memoryful scheduler 
for a loop may act differently in different iterations. Its technical consequence 
is that the elementwise definition of tr, as in Definition 4.7, no longer works for 
memoryful schedulers. 


4.3 Semantic Category of MDPs 


Finally, we extend from (unidirectional) roMDPs to (bidirectional) oMDPs (i.e. 
from the second to the first row in Fig. 3). The system-side construction is already 
presented in Sect. 2.5; the semantical side, described here, follows the same Int 
construction [10]. The common intuition is that of twists, see Fig. 6. 


Definition 4.9 (the semantic category S). We define S = Int(S,). Con- 
cretely, its objects are pairs (m,,m,) of natural numbers. Its arrows are given by 
arrows of Sy as follows: 


an arrow F: (mp, m) — (n,m) in S 


(12) 


an arrow F: Mm, +n — Nn, +m, in Sr 
By general properties of Int, S is a compact closed category (compCC). 


The Int construction applies not only to categories but also to functors. 


Definition 4.10 (S). The solution functor S: oMDP — S is defined by S = 
Int(S;). 


The following is our main theorem. 


Theorem 4.4 (the solution S is compositional). The solution functor S: 
oMDP — S is a compact closed functor, preserving operations ;,® as in 


S(A;B)=S(A);S(B), S(AGB)=S(A) eS S(B). 


We can easily confirm, from Definitions 4.4 and 4.8, that S computes the 
solution we want. Given an open MDP A, an entrance i and an exit j, S returns 
the set 


{ (RPO) (4, j), ERw™O) (4, 5) ) | Tis a memoryless scheduler } (13) 


of pairs of a reachability probability and expected reward, under different sched- 
ulers, in a passage from i to 7. 
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Remark 4.2 (synthesizing an optimal scheduler). The compositional solution 
functor S abstracts away schedulers and only records their results (see (13) where 
T is not recorded). At the implementation level, we can explicitly record sched- 
ulers so that our compositional algorithm also synthesizes an optimal scheduler. 
We do not do so here for theoretical simplicity. 


5 Implementation and Experiments 


Meager Semantics. Since our problem is to compute optimal expected rewards, 
in our compositional algorithm, we can ignore those intermediate results which 
are totally subsumed by other results (i.e. those which come from clearly sub- 
optimal schedulers). This notion of subsumption is formalized as an order < 
between parallel arrows in SM° (cf. Definition 4.1): (pij, Ti j)ij < (Pig. Thy )ag IE 
Pig S Pig and rij < ri j for each i, j. Our implementation works with this mea- 
pennies for better performance; specifically, it removes elements of S,(A) 

n (11) that are subsumed by others. It is possible to formulate this meager 
ee as categories and functors, compare it with the semantics in Sect. 4, 
and prove its correctness. We defer it to another venue for lack of space. 


Implementation. We implemented the compositional solution functor S: oMDP 
— S, using the meager semantics as discussed. This prototype implementation 
is in Python and called CompMDP. 

CompMDP takes a string diagram A of open MDPs as input; they are 
expressed in a textual format that uses operations ;,@ (such as the textual 
expression in Definition 2.9). Note that we are abusing notations here, identify- 
ing a string diagram of oMDPs and the composite oMDP A denoted by it. 

Given such input A, CompMDP returns the arrow S(.A), which is concretely 
given by pairs of a reachability probability and expected reward shown in (13) 
(we have suboptimal pairs removed, as discussed above). Since different pairs 
correspond to different schedulers, we choose a pair in which the expected reward 
is the greatest. This way we answer the optimal expected reward problem. 


Freezing. In the input format of CompMDP, we have an additional freeze oper- 
ator: any expression inside it is considered monolithic, and thus CompMDP 
does not solve it compositionally. Those frozen oMDPs—..e., those expressed by 
frozen expressions—are solved by PRISM [13] in our implementation. 

Freezing allows us to choose how deep—in the sense of the nesting of string 
diagrams—we go compositional. For example, when a component oMDP Ao is 
small but has many loops, fully compositional model checking of Ao can be more 
expensive than (monolithic) PRISM. Freezing is useful in such situations. 

We have found experimentally that the degree of freezing often should not be 
extremal (i.e. none or all). The optimal degree, which should be thus somewhere 
intermediate, is not known a priori. 

However, there are not too many options (the number of layers in compo- 
sitional model description), and freezing a half is recommended, both from our 
experience and for the purpose of binary search. 
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We require that a frozen oMDP should have a unique exit. Otherwise, an 
oMDP with a specified exit can have the reachability probability < 1, in which 
case PRISM returns oo as the expected reward. The last is different from our 
definition of expected reward (Remark 2.2). 


Research Questions. We posed the following questions. 


RQ1. Does the compositionality of CompMDP help improve performance? 

RQ2. How much do we benefit from freezing, i.e., a feature that allows us to 
choose the degree of compositionality? 

RQ3. What is the absolute performance of CompMDP? 

RQ4. Does the formalism of string digrams accommodate real-world models, 
enabling their compositional model checking? 

RQ5. On which (compositional) models does CompMDP work well? 


Experiment Setting. We conducted experiments on Apple 2.3 GHz Dual-Core 
Intel Core i5 with 16 GB of RAM. We designed three benchmarks, called Patrol, 
Wholesale, and Packets, as string diagrams of MDPs. Patrol is sketched in Fig. 1; 
it has layers of tasks, rooms, floors, buildings and a neighborhood. 

Wholesale is similar to Patrol, with four layers (item, dispatch, pipeline, 
wholesale), but their transition structures are more complex: they have more 
loops, and more actions are enabled in each position, compared to Patrol. The 
lowest-level component MDP is much larger, too: an item in Wholesale has 5000 
positions, while a task in Patrol has a unique position. 

Packets has two layers: the lower layer models a transmission of 100 packets 
with probabilistic failure. The upper layer is a sequence of copies of 2—5 variations 
of the lower layer—in total, we have 50 copies—modeling 50 batches of packets 

For Patrol and Wholesale, we conducted experiments with varying degree of 
identification (DI); this can be seen as an ablation study. These benchmarks 
have identical copies of a component MDP in their string diagrams; high DI 
means that these copies are indeed expressed as multiple occurrences of the same 
variable, informing CompMDP to reuse the intermediate solution. As DI goes 
lower, we introduce new variables for these copies and let them look different to 
CompMDP. Specifically, we have twice as many variables for DI-mid, and three 
(Patrol) or four (Wholesale) times as many for DI-low, as for DL-high. 

For Packets, we conducted experiments with different degrees of freezing 
(FZ). FZ-none indicates no freezing, where our compositional algorithm digs 
all the way down to individual positions as component MDPs. FZ-all freezes 
everything, which means we simply used PRISM (no compositionality). FZ-int. 
(intermediate) freezes the lower of the two layers. Note that this includes the 
performance comparison between CompMDP and PRISM (i.e. FZ-all). 

For Patrol and Wholesale, we also compared the performance of CompMDP 
and PRISM using their simple variations Patrol5 and Wholesale5. We did not 
use other variations (Patrol/Wholesalel—4) since the translation of the models 
to the PRISM format blowed up. 
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Table 1. Experimental results. 


exec. time [s] exec. time [s] 
benchmark Q| E| DI-high DI-mid DI-low benchmark IQI E| FZ-none FZ-int. FZ-all 
PRISM 
Patroll oë 08 21 42 83 ( ) 
Patrol2 08 08 23 48 90 Packets! 2.5-10° 5-10° TO 1 65 
Patrol3 0° 0? 22 43 89  Packets2 2.5-10° 5-10° TO 3 64 
Patrol4 0° 0° 30 60 121 Packets3 2.5.10ř 5-10° TO 1 56 
Packets4 2.5.10 5- 10° T 3 56 
Wholesalel 10° 2-108 130 260 394 20e A i p 9 3 
3 x Patrol5 10 0 22 22 TO 
Wholesale2 0” 2-10 92 179 274 Wholesales 5-107 08 TO 1A TO 
Wholesale3 2-108 4-108 6 12 23 eee at 
Wholesale4 2-108 4-108 129 260 393 |Q| is the number of positions; || is the number 


of transitions (only counting action branching, 
not probabilistic branching); execution time is 
the average of five runs, in sec.; timeout (TO) is 
1200 sec. 


Results and Discussion. Table 1 summarizes the experiment results. 


RQ1. A big advantage of compositional verification is that it can reuse inter- 
mediate results. This advantage is clearly observed in the ablation experiments 
with the benchmarks Patroll—4 and Wholesalel—4: as the degree of reuse goes 
1/2 and 1/3-1/4 (see above), the execution time grew inverse-proportionally. 
Moreover, with the benchmarks Packetsl—4, Patrol5 and Wholesale5, we see 
that compositionality greatly improves performance, compared to PRISM (FZ- 
all). Overall, we can say that compositionality has clear performance advantages 
in probabilistic model checking. 


RQ2. The Packets experiments show that controlling the degree of composi- 
tionality is important. Packet’s lower layer (frozen in FZ-int.) is a large and 
complex model, without a clear compositional structure; its fully compositional 
treatment turned out to be prohibitively expensive. The performance advan- 
tage of FZ-int. compared to PRISM (FZ-all) is encouraging. The Patrol5 and 
Wholesaled experiments also show the advantage of compositionality. 


RQ3. We find the absolute performance of CompMDP quite satisfactory. The 
Patrol and Wholesale benchmarks are huge models, with so many positions 
that fitting their explicit state representation in memory is already nontrivial. 
CompMDP, exploiting their succinct presentation by string diagrams, success- 
fully model-checked them in realistic time (6-130s with DI-high). 


RQ4. The experiments suggest that string diagrams are a practical modeling 
formalism, allowing faster solutions of realistic benchmarks. It seems likely that 
the formalism is more suited for task compositionality (where components are 
sub-tasks and they are sequentially composed with possible fallbacks and loops) 
rather than system compositionality (where components are sub-systems and 
they are parallelly composed). 


RQ5. It seems that the number of locally optimal schedulers is an important 
factor: if there are many of them, then we have to record more in the intermediate 
solutions of the meager semantics. This number typically increases when more 
actions are available, as the comparison between Patrol and Wholesale. 
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Abstract. We provide a novel method for sensitivity analysis of para- 
metric robust Markov chains. These models incorporate parameters and 
sets of probability distributions to alleviate the often unrealistic assump- 
tion that precise probabilities are available. We measure sensitivity in 
terms of partial derivatives with respect to the uncertain transition prob- 
abilities regarding measures such as the expected reward. As our main 
contribution, we present an efficient method to compute these partial 
derivatives. To scale our approach to models with thousands of parame- 
ters, we present an extension of this method that selects the subset of k 
parameters with the highest partial derivative. Our methods are based 
on linear programming and differentiating these programs around a given 
value for the parameters. The experiments show the applicability of our 
approach on models with over a million states and thousands of parame- 
ters. Moreover, we embed the results within an iterative learning scheme 
that profits from having access to a dedicated sensitivity analysis. 


1 Introduction 


Discrete-time Markov chains (MCs) are ubiquitous in stochastic systems mod- 
eling [8]. A classical assumption is that all probabilities of an MC are pre- 
cisely known—an assumption that is difficult, if not impossible, to satisfy in 
practice [4]. Robust MCs (rMCs), or uncertain MCs, alleviate this assumption 
by using sets of probability distributions, e.g., intervals of probabilities in the 
simplest case [12,39]. A typical verification problem for rMCs is to compute 
upper or lower bounds on measures of interest, such as the expected cumula- 
tive reward, under worst-case realizations of these probabilities in the set of 
distributions [52,59]. Thus, verification results are robust against any selection 
of probabilities in these sets. 
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Where to improve my model? As a running example, consider a ground vehicle 
navigating toward a target location in an environment with different terrain 
types. On each terrain type, there is some probability that the vehicle will slip 
and fail to move. Assume that we obtain a sufficient number of samples to 
infer upper and lower bounds (i.e., intervals) on the slipping probability on each 
terrain. We use these probability intervals to model the grid world as an rMC. 
However, from the rMC, it is unclear how our model (and thus the measure of 
interest) will change if we obtain more samples. For instance, if we take one more 
sample for a particular terrain, some of the intervals of the rMC will change, but 
how can we expect the verification result to change? And if the verification result 
is unsatisfactory, for which terrain type should we obtain more samples? 


Parametric Robust MCs. To reason about how additional samples will change 
our model and thus the verification result, we employ a sensitivity analysis [29]. 
To that end, we use parametric robust MCs (prMCs), which are rMCs whose sets 
of probability distributions are defined as a function of a set of parameters [26], 
e.g., intervals with parametric upper/lower bounds. With these functions over 
the parameters, we can describe dependencies between the model’s states. The 
assignment of values to each of the parameters is called an instantiation. Apply- 
ing an instantiation to a prMC induces an rMC by replacing each occurrence of 
the parameters with their assigned values. For this induced rMC, we compute 
a (robust) value for a given measure, and we call this verification result the 
solution for this instantiation. Thus, we can associate a prMC with a function, 
called the solution function, that maps parameter instantiations to values. 


Differentation for prMCs. For our running example, we choose the parameters to 
represent the number of samples we have obtained for each terrain. Naturally, the 
derivative of this solution function with respect to each parameter (a.k.a. sample 
size) then corresponds to the expected change in the solution upon obtaining 
more samples. Such differentiation for parametric MCs (pMCs), where parameter 
instantiations yield one precise probability distribution, has been studied in [34]. 
For prMCs, however, it is unclear how to compute derivatives and under what 
conditions the derivative exists. We thus consider the following problem: 


Problem 1 (Computing derivatives). Given a prMC and a parameter instanti- 
ation, compute the partial derivative of the solution function (evaluated at 
this instantiation) with respect to each of the parameters. 


Our Approach. We compute derivatives for prMCs by solving a parameterized 
linear optimization problem. We build upon results from convex optimization 
theory for differentiating the optimal solution of this optimization problem [9, 15]. 
We also present sufficient conditions for the derivative to exist. 


Improving Efficiency. However, computing the derivative for every parameter 
explicitly does not scale to more realistic models with thousands of parameters. 
Instead, we observe that to determine for which parameter we should obtain more 
samples, we do not need to know all partial derivatives explicitly. Instead, it may 
suffice to know which parameters have the highest (or lowest, depending on the 
application) derivative. Thus, we also solve the following (related) problem: 
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(a) Grid world. (b) MLEs and derivatives. (c) Portion of the MC. 


Fig. 1. Grid world environment (a). The vehicle ( #8 ) must deliver the package ( $% ) 


to the warehouse ( Ħ® ). We obtain the MLEs in (b), leading to the MC in (c). 


Problem 2 (k-highest derivatives). Given a prMC with |V| parameters, deter- 
mine the k < |V| parameters with the highest (or lowest) partial derivative. 


We develop novel and efficient methods for solving Problem 2. Concretely, we 
design a linear program (LP) that finds the k parameters with the highest (or 
lowest) partial derivative without computing all derivatives explicitly. This LP 
constitutes a polynomial-time algorithm for Problem 2 and is, in practice, orders 
of magnitude faster than computing all derivatives explicitly, especially if the 
number of parameters is high. Moreover, if the concrete values for the partial 
derivatives are required, one can additionally solve Problem 1 for only the result- 
ing k parameters. In our experiments, we show that we can compute derivatives 
for models with over a million states and thousands of parameters. 


Learning Framework. Learning in stochastic environments is very data-intensive 
in general, and millions of samples may be required to obtain sufficiently tight 
bounds on measures of interest [43,47]. Several methods exist to obtain intervals 
on probabilities based on sampling, including statistical methods such as Hoeffd- 
ing’s inequality [14] and Bayesian methods that iteratively update intervals [57]. 
Motivated by this challenge of reducing the sample complexity of learning algo- 
rithms, we embed our methods in an iterative learning scheme that profits from 
having access to sensitivity values for the parameters. In our experiments, we 
show that derivative information can be used effectively to guide sampling when 
learning an unknown Markov chain with hundreds of parameters. 


Contributions. Our contributions are threefold: (1) We present a first algorithm 
to compute partial derivatives for prMCs. (2) For both pMCs and prMCs, we 
develop an efficient method to determine a subset of parameters with the highest 
derivatives. (3) We apply our methods in an iterative learning scheme. We give 
an overview of our approach in Sect. 2 and formalize the problem statement in 
Sect. 3. In Sect. 4, we solve Problems (1) and (2) for pMCs, and in Sect.5 for 
prMCs. Finally, the learning scheme and experiments are in Sect. 6. 


2 Overview 
We expand the example from Sect. 1 to illustrate our approach more concretely. 


The environment, shown in Fig. la, is partitioned into five regions of the same 
terrain type. The vehicle can move in the four cardinal directions. Recall that 
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Fig. 2. Parametric MC. Fig. 3. Parametric robust MC. 


the slipping probabilities are the same for all states with the same terrain. The 
vehicle follows a dedicated route to collect and deliver a package to a warehouse. 
Our goal is to estimate the expected number of steps f* to complete the mission. 


Estimating Probabilities. Classically, we would derive maximum likelihood esti- 
mates (MLEs) of the probabilities by sampling. Consider that, using N samples 
per slipping probability, we obtained the rough MLEs shown in Fig. 1b and thus 
the MC in Fig. lc. Verifying the MC shows that the expected travel time (called 
the solution) under these estimates is f = 25.51 steps, which is far from the 
travel time of f* = 21.62 steps under the true slipping probabilities. We want to 
close this verification-to-real gap by taking more samples for one of the terrain 
types. For which of the five terrain types should we obtain more samples? 


Parametric Model. We can model the grid world as a pMC, i.e., an MC with 
symbolic probabilities. The solution function for this pMC is the travel time f ; 
being a function of these symbolic probabilities. We sketch four states of this 
pMC in Fig. 2. The most relevant parameter is then naturally defined as the 
parameter with the largest partial derivative of the solution function. As shown 


in Fig. 1B, parameter v4 has the highest partial derivative of ge = 22.96, while 
the derivative of v3 is zero as no states related to this parameter are ever visited. 


Parametric Robust Model. The approach above does not account for the uncer- 
tainty in each MLE. Terrain type v4 has the highest derivative but also the largest 
sample size, so sampling v4 once more has likely less impact than for, e.g., v1. So, 
is v4 actually the best choice to obtain additional samples for? The prMC that 
allows us to answer this question is shown in Fig. 3, where we use (parametric) 
intervals as uncertainty sets. The parameters are the sample sizes N1, ..., N5 
for all terrain types (contrary to the pMC, where parameters represent slipping 
probabilities). Now, if we obtain one additional sample for a particular terrain 
type, how can we expect the uncertainty sets to change? 


Derivatives for prMCs. We use the prMC to compute an upper bound f* on the 
true solution f*. Obtaining one more sample for terrain type v; (i.e., increasing 
N; by one) shrinks the interval [g(N;), g(N;)] on expectation, which in turn 
decreases our upper bound f+. Here, g and g are functions mapping sample 


sizes to interval bounds. The partial derivatives ar for the prMC are also 
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shown in Fig. 1b and give a very different outcome than the derivatives for the 
pMC. In fact, sampling vı yields the biggest decrease in the upper bound fT, 
so we ultimately decide to sample for terrain type vı instead of v4. 


Efficient Differentiation. We remark that we do not need to know all derivatives 
explicitly to determine where to obtain samples. Instead, it suffices to know 
which parameter has the highest (or lowest) derivative. In the rest of the paper, 
we develop efficient methods for computing either all or only the k € N highest 
partial derivatives of the solution functions for pMCs and prMCs. 


Supported Extensions. Our approaches are applicable to general pMCs and prMCs 
whose parameters can be shared between distributions (and thus capture depen- 
dencies, being a common advantage of parametric models in general [40]). Besides 
parameters in transition probabilities, we can handle parametric initial states, 
rewards, and policies. We could, e.g., use parameters to model the policy of a 
surveillance drone in our example and compute derivatives for these parameters. 


3 Formal Problem Statement 


Let V = {v1,...,ve}, vi E R be a finite and ordered set of parameters. A 
parameter instantiation is a function u: V — R that maps a parameter to a real 
valuation. The vector function u(v1,...,ve) = [u(v1),---,u(ve)]' € RE denotes 
an ordered instantiation of all parameters in V through u. The set of polynomials 
over the parameters V is Q[V]. A polynomial f can be interpreted as a function 
f: R° — R where f(u) is obtained by substituting each occurrence of v by u(v). 
We denote these substitutions with ffu]. 

For any set X, let pFuny(X) = {f | f: X — Q[V]} be the set of functions 
that map from X to the polynomials over the parameters V. We denote by 
pDisty(X) C pFuny(X) the set of parametric probability distributions over X, 
i.e., the functions f: X — Q[V] such that f(x)[u] € [0,1] and X pex f(x)[u] = 1 
for all parameter instantiations u. 


Parametric Markov Chain. We define a pMC as follows: 


Definition 1 (pMC). A pMC M is a tuple (S,51,V,P), where S is a finite 
set of states, sr E€ Dist(S) a distribution over initial states, V a finite set of 
parameters, and P: S — pDisty(S) a parametric transition function. 


Applying an instantiation u to a pMC yields an MC M[u] by replacing each 
transition probability f € Q[V] by f[u]. We consider expected reward mea- 
sures based on a state reward function R: S — R. Each parameter instantia- 
tion for a pMC yields an MC for which we can compute the solution for the 
expected reward measure [8]. We call the function that maps instantiations to 
a solution the solution function. The solution function is smooth over the set of 
graph-preserving instantiations [41]. Concretely, the solution function sol for the 
expected cumulative reward under instantiation u is written as follows: 


sol(u) = > (s7(s) > rew(w)  Pr(w,u)), (1) 


ses wEQ(s) 
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where (2(s) is the set of paths starting in s € S, rew(w) = R(so) + R(s1) +--+- 
is the cumulative reward over w = 898,---, and Pr(w,u) is the probability for 
a path w € §2(s). If a terminal (sink) state is reached from state s € S with 
probability one, the infinite sum over w € 92(s) in Eq. (1) exist [53]. 


Parametric Robust Markov Chains. The convex polytope T4, C R” 
defined by matrix A € R™*” and vector b € R™ is the set T4, = {p E R” | 
Ap < b}. We denote by T, the set of all convex polytopes of dimension n, i.e., 


Tn = {Tay | A E R”*”, b E€ R”, m €N}. (2) 


A robust MC (rMC) [54,58] is a tuple (S, sz, P), where S and sz are defined as for 
pMCs and the uncertain transition function P: S — T)s, maps states to convex 
polytopes T € Tis; Intuitively, an rMC is an MC with possibly infinite sets of 
probability distributions. To obtain robust bounds on the verification result for 
any of these MCs, an adversary nondeterministically chooses a precise transition 
function by fixing a probability distribution P(s) € P(s) for each s € S. 

We extend rMCs with polytopes whose halfspaces are defined by polynomials 
Q|V] over V. To this end, let T,,[V] be the set of all such parametric polytopes: 


Ta[V] = {Tap | A € QIV]™*”, b € Q(V]™, m €N}. (3) 


An element T € T,,[V] can be interpreted as a function T: Rf > 2") that 
maps an instantiation u to a (possibly empty) convex polytopic subset of R”. 
The set T [ul] is obtained by substituting each v; in T by u(v;) for alli =1,..., 2. 


Example 1. The uncertainty set for state sı of the prMC in Fig.3 is the para- 
metric polytope T € T2[V] with singleton parameter set V = {N1}, such that 


T = {[pia, p1,2]" € R? | g, (N1) < ria < H(Ni), 
1—9:(M1) < p2 <19, (N1), p12 +p, = 1}. 


We use parametric convex polytopes to define prMCs: 


Definition 2 (prMC). A prMC Mp isa tuple (S, sr, V, P), where S, sr, and V 
are defined as for pMCs (Def. 1), and where P: S — Tisi[V] is a parametric and 


uncertain transition function that maps states to parametric convex polytopes. 


Applying an instantiation u to a prMC yields an rMC M pfu] by replacing each 
parametric polytope T € T\s\[V] by T[u], i.e., a polytope defined by a concrete 
matrix A € R™*” and vector b € R™. Without loss of generality, we consider 
adversaries minimizing the expected cumulative reward until reaching a set of 
terminal states Sp C S. This minimum expected cumulative reward solg(u), 
called the robust solution on the instantiated prMC M p[ul, is defined as 


solr(u) = X (s1(s)- mn X rew(w)  Pr(w, u, P)), (4) 


ses Perils) wEN(s) 


We refer to the function solg: Rf > R as the robust solution function. 
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Assumptions on pMCs and prMCs. For both pMCs and prMCs, we assume that 
transitions cannot vanish under any instantiation (graph-preservation). That is, 
for every s,s’ € S, we have that P(s)[u](s’) (for pMCs) and P(s)[u](s’) (for 
prMCs) are either zero or strictly positive for all instantiations u. 


Problem Statement. Let f(q1,...,qn) E€ R” be a differentiable multivariate 
function with m € N. We denote the partial derivative of f with respect to q by 
oe € R”. The gradient of f combines all partial derivatives in a single vector as 


Vaf = pE, san 2L] € R™*”. We only use gradients Vu f with respect to the 
parameter instantiation u, so we simply write Vf in the remainder. 


The gradient of the robust solution function evaluated at the instantiation u 
is Vsolp[u] = K es ) [u], TO (25) ful]. We solve the following problem. 


Problem 1. Given a prMC Mp and a parameter instantiation u, compute 
the gradient Vsolr[ul of the robust solution function evaluated at u. 


Solving Problem 1 is linear in the number of parameters, which may lead to 
significant overhead if the number of parameters is large. Typically, it suffices to 
only obtain the parameters with the highest derivatives: 


Problem 2. Given a prMC Mp, an instantiation u, and a k < |V|, compute 
a subset V* of k parameters for which the partial derivatives are maximal. 


For both problems, we present polynomial-time algorithms for pMCs (Sect. 4) 
and prMCs (Sect.5). Section6 defines problem variations that we study 
empirically. 


4 Differentiating Solution Functions for pMCs 


We can compute the solution of an MC M[u] with instantiation u based on a 
system of |S| linear equations; here for an expected reward measure [8]. Let x = 
[Esiri Zaa" and r = [rs,,-.. Faia" be variables for the expected cumulative 
reward and the instantaneous reward in each state s € S, respectively. Then, for 
a set of terminal (sink) states Sr C S, we obtain the equation system 


Ls = 0, Vs € Sip (5a) 
£s = rs + P(s)[u]z, Vs € S\Sr. (5b) 
Let us set P(s)[u] = 0 for all s € Sr and define the matrix P[u] € RISIXISI by 
stacking the rows P(s)[u] for all s € S. Then, Eq. (5) is written in matrix form 


as (Isı — P[u])x = r. The equation system in Eq. (5) can be efficiently solved 
by, e.g., Gaussian elimination or more advanced iterative equation solvers. 


4.1 Computing Derivatives Explicitly 


We differentiate the equation system in Eq. (5) with respect to an instantiation 
u(vu;) for parameter v; € V, similar to, e.g., [34]. For all s € Sr, the derivative 
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ery is trivially zero. For all s € S \ Sr, we obtain via the product rule that 


Oz, — OP(s)a ie 
Ou(v;) ulvi) p= 


+3P(s)! 7 sa Ox 
Tyla + Pugs, © 


where x* € RISI is the solution to Eq. (5). In matrix form for all s € S, this 
yields 
Ox OPx* 
= rs 7 
ðu(vi) ulvi) [u] (7) 


(Isı — P[ul) 


The solution defined in Eq. (1) is computed as sol[u] = s} 2*. Thus, the partial 
derivative of the solution function with respect to u(v;) in closed form is 


Osol Ox 1 @Px* 
(sans) hl > sr Ou(v;) = si (Isi E Plu) Du(v;) ul. (8) 


Algorithm for Problem 1. Let us provide an algorithm to solve 1 for pMCs. 
8 provides a closed-form expression for the partial derivative of the solution 
function, which is a function of the vector x* in Eq. (5). However, due to the 
inversion of (Js) — Plu]), it is generally more efficient to solve the system of 
equations in Eq. (7). Doing so, the partial derivative of the solution with respect 
to u(v;) is obtained by: (1) solving Eq. (5) with u to obtain a* € R!SI, and (2) 
solving the equation system in Eq. (7) with |S| unknowns for this vector «*. We 
repeat step 2 for all of the |V| parameters. Thus, we can solve Problem 1 by 
solving |V| + 1 linear equation systems with |S| unknowns each. 


4.2 Computing k-Highest Derivatives 


To solve Problem 2 for pMCs, we present a method to compute only the 
k < £ = |V| parameters with the highest (or lowest) partial derivative with- 
out computing all derivatives explicitly. Without loss of generality, we focus on 
the highest derivative. We can determine these parameters by solving a combi- 
natorial optimization problem with binary variables z; € {0,1} for i = 1,..., 4 
Our goal is to formulate this optimization problem such that an optimal value of 
z* = 1 implies that parameter v; € V belongs to the set of k highest derivatives. 
Concretely, we formulate the following mized integer linear problem (MILP) [60]: 


S T 
9 
yeRISi ze{01} a a 
OPx* 
subject to (Isı — P[u]) y = y9 Zi fu] (9b) 
a“ du(vi) 
ator +k. (9c) 


Constraint (9c) ensures that any feasible solution to Eq. (9) has exactly k nonzero 
entries. Since matrix (J|s;—P[u)) is invertible by construction (see, e.g., [53]), Eq. 
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(9) has a unique solution in y for each choice of z € {0,1}°. Thus, the objective 
value s} y is the sum of the derivatives for the parameters v; € V for which 
zi = 1. Since we maximize this objective, an optimal solution y*, z* to Eq. (9) is 
guaranteed to correspond to the k parameters that maximize the derivative of 
the solution in Eq. (8). We state this correctness claim for the MILP: 


Proposition 1. Let y*, z* be an optimal solution to Eq. (9). Then, the set 
V* = {v;i E V | 2% = 1} is a subset of k < £ parameters with maximal derivatives. 


The set V* may not be unique. However, to solve Problem 2, it suffices to obtain 
a set of k parameters for which the partial derivatives are maximal. Therefore, 
the set V* provides a solution to Problem 2. We remark that, to solve Problem 2 
for the k lowest derivatives, we change the objective in Eq. (9a) to minimize s} y. 


Linear Relaxation. The MILP in Eq. (9) is computationally intractable for high 
values of @ and k. Instead, we compute the set v* via a linear relaxation of 
the MILP. Specifically, we relax the binary variables z € {0,1}* to continuous 
variables z € [0,1]*. As such, we obtain the following LP relaxation of Eq. (9): 


ee T 
10 
eae TY = 
4 
P * 
subject to (Isı — P[u]) y = 2. ži Fa fu] (10b) 
0<z%<1, Vi=l,...,£ (10c) 


Denote by yt, z* the solution of the LP relaxation in Eq. (10). For details on 
such linear relaxations of integer problems, we refer to [36,46]. In our case, every 
optimal solution yt, z+ to the LP relaxation with only binary values z7 € {0,1} 
is also optimal for the MILP, resulting in the following theorem. 


Theorem 1. The LP relaxation in Eq. (10) has an optimal solution y*, z* 
with z+ € {0,1} (i.e., every optimal variable z7} is binary), and every such a 
solution is also an optimal solution of the MILP in Eq. (9). 


Proof. From invertibility of (I [S| = Plu)), we know that Eq. (9) is equivalent to 


: 1 OPx* 
ae 2. Zi (s7 (Isı — Plu) TA ul) (Lla) 
subject to 21 +++: + z = k. (11b) 


The linear relaxation of Eq. (11) is an LP whose feasible region has integer 
vertices (see, e.g., [37]). Therefore, both Eq. (11) and its relaxation Eq. (10) 
have an integer optimal solution z+, which constructs z* in Eq. (9). 
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The binary solutions z+ € {0,1} are the vertices of the feasible set of the 
LP in Eq. (10). A simplex-based LP solver can be set to return such a solution.! 


Algorithm for Problem 2. We provide an algorithm to solve Problem 2 for pMCs 
consisting of two steps. First, for pMC M and parameter instantiation u, we solve 
the linear equation system in Eq. (7) for x* to obtain the solution solju] = s{ x*. 
Second, we fix a number of parameters k < £ and solve the LP relaxation in 
Eq. (10). The set V* of parameters with maximal derivatives is then obtained as 


defined in Proposition 1. The parameter set V* is a solution to Proposition 2. 


5 Differentiating Solution Functions for prMCs 


We shift focus to prMCs. Recall that solutions soly[u] are computed for the 
worst-case realization of the uncertainty, called the robust solution. We derive 
the following equation system, where, as for pMCs, x € RIS! represents the 
expected cumulative reward in each state. 


Ts = 0, Vs € Sr (12a) 


£s =f; + inf Te), Vs € S \ ST. 12b 
a ue) 


Solving Eq. (12) directly corresponds to solving a system of nonlinear equations 
due to the inner infimum in Eq. (12b). The standard approach from robust 
optimization [12] is to leverage the dual problem for each inner infimum, e.g., as 
is done in [20,52]. For each s € S, P(s) is a parametric convex polytope T4, as 
defined in Eq. (3). The dimensionality of this polytope depends on the number of 
successor states, which is typically much lower than the total number of states. 
To make the number of successor states explicit, we denote by post(s) C S the 
successor states of s € S and define T4» € T)post(s)|[V] with As € Qs x |post(s)| 
and b,[u] € Q™s (recall m, is the number of halfspaces of the polytope). Then, 
the infimum in Eq. (12b) for each s € S \ Sr is 


minimize p' x (13a) 
subject to As[u]p < bsfu] (13b) 
1'p=1, (13c) 


where 1 denotes a column vector of ones of appropriate size. Let Zpost(s) = 
[£s]scpost(s) be the vector of decision variables corresponding to the (ordered) 
successor states in post(s). The dual problem of Eq. (13), with dual variables 
a € R™s and 6 € R (see, e.g., [11] for details), is written as follows: 


maximize —b,[u]'a— 8 (14a) 
subject to A, fu] a + Zpost(s) + G1 = 0 (14b) 
a> 0. (14c) 


1 Even if a non-vertex solution y*, zt is obtained, we can use an arbitrary tie-break 


rule on z*, which forces each Ze binary and preserves the sum in Eq. (10d). 
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(a) Well-defined optimum. (b) Non-unique optimum. (c) Too many active constraints. 


Fig. 4. Three polytopic uncertainty sets (blue shade), with the vector x, the worst-case 
points p*, and the active constraints shown in red. (Color figure online) 


By using this dual problem in Eq. (12b), we obtain the following LP with decision 
variables x € RISI, and with a, € R”s and 8, € R for every s € S: 


maximize s} £% (15a) 
subject to £s = 0, Vs € Sr (15b) 
Ts = Ts — (b,[u] ' as + Bs) P YsEe S\Sr (15c) 
A,{u] as + Epost(s) + G21 =0, as, > 0, Ys € S\ Sr. (15d) 


The reformulation of Eq. (12) to Eq. (15) requires that sz > 0, which is trivially 
satisfied because s; is a probability distribution. Denote by x*,aœ*, 8* an optimal 
point of Eq. (15). The 2* element of this optimum is also an optimal solution of 
Eq. (12) [12]. Thus, the robust solution defined in Eq. (4) is soly[u] = s} 2*. 


5.1 Computing Derivatives via pMCs (and When It Does Not 
Work) 


Toward solving Problem 1, we provide some intuition about computing robust 
solutions for prMCs. The infimum in Eq. (12) finds the worst-case point p* in 
each set P(s)[u] that minimizes (p*) ' x. This minimization is visualized in Fig. 4a 
for an uncertainty set that captures three probability intervals p; S Pi S Pi, t= 
1,2,3. Given the optimization direction x (arrow in Fig. 4a), the point p* (red 
dot) is attained at the vertex where the constraints Pi < pı and P3 < p2 are 


active.? Thus, we obtain that the point in the polytope that minimizes (p*)' a 
is p* = [p> Pa 1- p — p,- Using this procedure, we can obtain a worst-case 
point p% for each state s € S. We can use these points to convert the prMC into 
an induced pMC with transition function P(s) = p* for each state s € S. 

For small changes in the parameters, the point p* in Fig. 4a changes smoothly, 
and its closed-form expression (i.e., the functional form) remains the same. As 
such, it feels intuitive that we could apply the methods from Sect. 4 to compute 
partial derivatives on the induced pMC. However, this approach does not always 
work, as illustrated by the following two corner cases. 


? An inequality constraint gx < h is active under the optimal solution x* if gx* = 
h [15]. 
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1. Consider Fig. 4b, where the optimization direction defined by x is parallel to 
one of the facets of the uncertainty set. In this case, the worst-case point p* 
is not unique, but an infinitesimal change in the optimization direction x will 
force the point to one of the vertices again. Which point should we choose to 
obtain the induced pMC (and does this choice affect the derivative)? 

2. Consider Fig. 4c with more than |S| — 1 active constraints at the point p*. 
Observe that decreasing p3 changes the point p* while increasing p3 does not. 
In fact, the optimal point p* changes non-smoothly with the halfspaces of the 
polytope. As a result, also the solution changes non-smoothly, and thus, the 
derivative is not defined. How do we deal with such a situation? 


These examples show that computing derivatives via an induced pMC by obtain- 
ing each point p% can be tricky or is, in some cases, not possible at all. In what 
follows, we present a method that directly derives a set of linear equations to 
obtain derivatives for prMCs (all or only the k highest) based on the solution to 
the LP in Eq. (15), which intrinsically identifies the corner cases above in which 
the derivative is not defined. 


5.2 Computing Derivatives Explicitly 


We now develop a dedicated method for identifying if the derivative of the solution 
function for a prMC exists, and if so, to compute this derivative. Observe from 
Fig. 4 that the point p* is uniquely defined and has a smooth derivative only in 
Fig. 4a with two active constraints. For only one active constraint (Fig. 4b), the 
point is underdetermined, while for three active constraints (Fig. 4c), the derivative 
may not be smooth. In the general case, having exactly n — 1 active constraints 
(whose facets are nonparallel) is a sufficient condition for obtaining a unique and 
smoothly changing point p* in the n-dimensional probability simplex. 


Optimal Dual Variables. The optimal dual variables a% > 0 for each s € S \ Sr 
in Eq. (15) indicate which constraints of the polytope A;[u]p < bs[u] are active, 
i.e., for which rows as |u] of As[u] it holds that a,;[u]p* = b,[u]. Specifically, a 
value of a, ; > 0 implies that the it! constraint is active, and Qs; = 0 indicates 
a nonactive constraint [15]. We define E, = [e1,...,€m,] E€ {0,1} as a vector 
whose binary values e; Vi € {1,...,ms} are given as e; = [az ; > O].° Moreover, 
denote by D(£,) the matrix with E, on the diagonal and zeros elsewhere. We 
reduce the LP in Eq. (15) to a system of linear equations that encodes only the 
constraints that are active under the worst-case point p% for each s € S \ Sr: 


£s =Q, Ys € Sr (16a) 
Ts = rs — (bs[u] D(Es)as + Bs) , Yse S\Sr (16b) 
Aslu] D(Es)as + 2post(e) + Bel =0, as > 0, Yse S\ Sr. (16c) 


Differentiation. However, when does Eq. (16) have a (unique) optimal solution? 
To provide some intuition, let us write the equation system in matrix form, i.e., 


3 We use Iverson-brackets: [x] = 1 if x is true and [x] = 0 otherwise. 
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C E ap ] To d, where we omit an explicit definition of matrix C and vector d 
for brevity. It is apparent that if matrix C is nonsingular, then Eq. (16) has a 
unique solution. This requires matrix C to be square, which is achieved if, for 
each s € S \ Sr, we have |post(s)| = }> Es + 1. In other words, the number of 
successor states of s is equal to the number of active constraints of the polytope 
plus one. This confirms our previous intuition from Sect. 5.1 on a polytope for 
|post(s)| = 3 successor states, which required X>; E; = 2 active constraints. 

Let us formalize this intuition about computing derivatives for prMCs. We 
can compute the derivative of the solution «* by differentiating the equation 
system in Eq. (16) through the product rule, in a very similar manner to the 
approach in Sect. 4. We state this key result in the following theorem. 


Theorem 2. Given a prMC Mp and an instantiation u, compute x*, a*, 0* for 
ae Oa 


Eq. (15) and choose a parameter vi E€ V. The partial derivatives EIGN Buln)? 


and ale) are obtained as the solution to the linear equation system 


Ma =0, Vs € Sr (17a) 
Ox as OB, B F 0b, [ul] 
mo balul ' DE) aT) P auva = ~ (2) DE ao (17b) 
Vs € S \ Sr 
das i OZX post(s) A Ops E * OAs [u] 
Atel DE) * Gules) ap) ~~) PP) gu O 
Vs € S \ Sr. 


The proof follows from applying the product rule to Eq. (16) and is provided in 
[6, Appendix A.1]. To compute the derivative for a parameter v; E€ V, we thus 
solve a system of linear equations of size |S]+}/,<s\s, |post(s)|. Using Theorem 
2, we obtain sufficient conditions for the solution function to be differentiable. 


Lemma 1. Write the linear equation system in Eq. (17) in matrix form, i.e., 


T 
Ox Oa 0B 
C Ea Ou(v;)? ðu(vi) = d, (18) 


for C € R™4 andd E RI, q = |S| +) ses\ sp |post(s)|, which are implicitly given 
by Eq. (17). The solution function solg|u] is differentiable at instantiation u if 


matrix C is nonsingular, in which case we obtain (Fore) [u] =s} wey 


Proof. The partial derivative of the solution function is pos. fu] = sI Ba 


where ie is (a part of) the solution to Eq. (16). Thus, the solution function 


u(vi 
is differentiable if there is a (unique) solution to Eq. (16), which is guaranteed 


if matrix C is nonsingular. Thus, the claim in Lemma 1 follows. 


Algorithm for Problem1. We use Theorem 2 to solve Problem 1 for prMCs, 
similarly as for pMCs. Given a prMC Mp and an instantiation u, we first solve 
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Eq. (15) to obtain 2*,a*, 3*. Second, we use až to compute the vector Es of 
active constraints for each s € S\ Sr. Third, for every parameter v € V, we solve 
the equation system in Eq. (17). Thus, to compute the gradient of the solution 
function, we solve one LP and |V] linear equation systems. 


5.3 Computing k-Highest Derivatives 


We directly apply the same procedure from Sect. 4.2 to compute the parameters 
with the k < £ highest derivatives. As for pMCs, we can compute the k highest 
derivatives by solving a MILP encoding the equation system in Eq. (17) for 
every parameter v € V, which we present in [6, Appendix A.2] for brevity. This 
MILP has the same structure as Eq. (9), and thus we may apply the same linear 
relaxation to obtain an LP with the guarantees as stated in Theorem 1. In other 
words, solving the LP relaxation yields the set V* of parameters with maximal 
derivatives as in Proposition 1. This set V* is a solution to Problem 2 for prMCs. 


6 Numerical Experiments 


We perform experiments to answer the following questions about our approach: 


1. Is it feasible (in terms of computational complexity and runtimes) to compute 
all derivatives, in particular compared to computing (robust) solutions? 

2. How does computing only the k highest derivatives compare to computing all 
derivatives? 

3. Can we apply our approach to effectively determine for which parameters to 
sample in a learning framework? 


Let us briefly summarize the computations involved in answering these questions. 
First of all, computing the solution sol(u) for a pMC, which is defined in Eq. 
(1), means solving the linear equation system in Eq. (5). Similarly, computing 
the robust solution solp(u) for a prMC means solving the LP in Eq. (15). Then, 
solving Problem 1, i.e., computing all |V| partial derivatives, amounts to solving 
a linear equation system for each parameter v € V (namely, Eq. (5) for a prMC 
and Eq. (17) for a prMC). In contrast, solving Problem 2, i.e., computing a subset 
V* of parameters with maximal (or minimal) derivative, means for a pMC that 
we solve the LP in Eq. (10) (or the equivalent LP for a prMC) and thereafter 
extract the subset of V* parameters using Proposition 1. 


Problem 8: Computing the k-highest Derivatives. A solution to Problem 2 is a 
set V* of k parameters but does not include the computation of the derivatives. 


However, it is straightforward to also obtain the actual derivatives ( fa.) [ul 
for each parameter v € V*. Specifically, we solve Problem 1 for the k parameters 
in V*, such that we obtain the partial derivatives for all v € V*. We remark that, 
for k = 1, the derivative follows directly from the optimal value slyt of the LP 
in Eq. (10), so this additional step is not necessary. We will refer to computing 


the actual values of the k highest derivatives as Problem 3. 
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Setup. We implement our approach in Python 3.10, using Storm [35] to parse 
pMCs, Gurobi [31] to solve LPs, and the SciPy sparse solver to solve equation 
systems. All experiments run on a computer with a 4GHz Intel Core i9 CPU 
and 64 GB RAM, with a timeout of one hour. Our implementation is available 
at https: //doi.org/10.5281/zenodo.7864260. 


Grid World Benchmarks. We use scaled versions of the grid world from the 
example in Sect. 2 with over a million states and up to 10000 terrain types. The 
vehicle only moves right or down, both with 50% probability (wrapping around 
when leaving the grid). Slipping only occurs when moving down and (slightly 
different from the example in Sect.2) means that the vehicle moves two cells 
instead of one. We obtain between N = 500 and 1000 samples of each slipping 
probability. For the pMCs, we use maximum likelihood estimation (Z, with p 
the sample mean) obtained from these samples as probabilities, whereas, for the 
prMCs, we infer probability intervals using Hoeffding’s inequality (see Q3 for 
details). 


Benchmarks from Literature. We also use several instances of parametric exten- 
sions of MCs and Markov decision processes (MDPs) from standard benchmark 
suits [33,44]. We also use pMC benchmarks from [5,23] as these models have 
more parameters than the traditional benchmarks. We extend these benchmarks 
to prMCs by constructing probability intervals around the pMC’s probabilities. 


Results. The results for all benchmarks are shown in [6, Appendix B, Tab. 2-3]. 


Q1. Computing Solutions vs. Derivatives 


We investigate whether computing derivatives is feasible on p(r)MCs. In partic- 
ular, we compare the computation times for computing derivatives on p(r)MCs 
(Problems 1 and 3) with the times for computing the solution for these models. 
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Fig. 5. Runtimes (log-scale) for computing a single derivative (left, Problem 1) or the 
highest derivative (right, Problem 3), vs. computing the solution sol[u]/sol,[u]. 
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Table 1. Model sizes, runtimes, and derivatives for selection of grid world models. 


Model statistics Verifying Problem 1 Problem 3 Derivatives 
Type | |S| IV #trans | sol(py[u] | Time [s] | All derivs. [s] | k = 1 [s] | k = 10[s] | Highest | Error % 
MC 5000 50 14995 5.07 1.39 3.32 2.64 2.69 | 1.54e+00 | 0.0 
pMC 5000 00 14995 5.05 1.36 4.17 2.63 2.66 | 1.28e+00 | 0.0 
pMC 5000 921 14995 4.93 1.87 19.92 4.52 2.87 | 1.20e+-00 | 0.0 
MC 80000 00 | 239995 8.01 25.54 98.47 45.18 46.87 | 1.95e+00 | 0.0 
pMC 80000 | 1000 | 239995 8.01 25.64 612.97 48.92 58.20 | 2.08e+00 | 0.0 
MC 80000 | 9831 | 239995 7.93 25.52} 5,650.25 347.76 | 1,343.59 | 2.10e+00 | 0.0 
pMC | 1280000 00 | 3839995 12.90 902.52) 4,747.43 1,396.51 | 1,507.77 | 3.32e+00 | 0.0 
MC | 1280000 | 1000 | 3839995 12.79 902.67 | 37,078.12 1,550.45 | 1,617.27 | 3.18e+00 | 0.0 
pMC | 1280000 | 10000 | 3839995 | Timeout? 


prMC 5000 00 14995 136.07 23.46 3.55 0.60 1.58 | -1.26e-02 | -0.0 
rMC 5000 921 14995 138.74 29.82 25.23 0.85 1.09 | -4.44e-03 | -0.0 
prMC 20000 00 59995 | 2,789.77 | 1,276.43 15.68 2.40 2.70 | -4.96e-01 | -0.1 
rMC 20000 | 1000 59995 | 2,258.41 339.96 159.70 3.53 4.09 | -9.51e-02 | -0.0 


prMC| 80000} 100 | 239995 Timeout? 
“Extrapolated from the runtimes for 10 to all |V| parameters. 
>Timeout (1h) occurred for verifying the p(r)MC, not for computing derivatives. 


In Fig.5, we show for all benchmarks the times for computing the solution 
(defined in Eqs. (1) and (4)), versus computing either a single derivative for Prob- 
lem 1 (left) or the highest derivative of all parameters resulting from Problem 3 
(right). A point (x,y) in the left plot means that computing a single derivative 
took x seconds while computing the solution took y seconds. A line above the 
(center) diagonal means we obtained a speed-up over the time for computing the 
solution; a point over the upper diagonal indicates a 10x speed-up or larger. 


One Derivative. The left plot in Fig.5 shows that, for pMCs, the times for 
computing the solution and a single derivative are approximately the same. This 
is expected since both problems amount to solving a single equation system with 
|S| unknowns. Recall that, for prMCs, computing the solution means solving 
the LP in Eq. (15), while for derivatives we solve an equation system. Thus, 
computing a derivative for a prMC is relatively cheap compared to computing 
the solution, which is confirmed by the results in Fig. 5. 


Highest Derivative. The right plot in Fig.5 shows that, for pMCs, computing 
the highest derivative is slightly slower than computing the solution (the LP to 
compute the highest derivative takes longer than the equation system to compute 
the solution). On the other hand, computing the highest derivative for a prMC 
is still cheap compared to computing the solution. Thus, if we are using a prMC 
anyways, computing the derivatives is relatively cheap. 


Q2. Runtime Improvement of Computing only k Derivatives 


We want to understand the computational benefits of solving Problem 3 over 
solving Problem 1. For Q2, we consider all models with |V| > 10 parameters. 
An excerpt of results for the grid world benchmarks is presented in Table 1. 
Recall that, after obtaining the (robust) solution, solving Problem 1 amounts 
to solving |V| linear equation systems, whereas Problem 3 involves solving a 
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Fig. 6. Runtimes (log-scale) for computing the highest (left) or 10 highest (right) 
derivatives (Problem 3), versus computing all derivatives (Problem 1). 


single LP and k equations systems. From Table 1, it is clear that computing k 
derivatives is orders of magnitudes faster than computing all |V| derivatives, 
especially if the total number of parameters is high. 

We compare the runtimes for computing all derivatives (Problem 1) with 
computing only the k = 1 or 10 highest derivatives (Problem 3). The left plot 
of Fig.6 shows the runtimes for k = 1, and the right plot for the k = 10 highest 
derivatives. The interpretation for Fig.6 is the same as for Fig. 5. From Fig. 6, 
we observe that computing only the k highest derivatives generally leads to 
significant speed-ups, often of more than 10 times (except for very small models). 
Moreover, the difference between k = 1 and k = 10 is minor, showing that 
retrieving the actual derivatives after solving Problem 2 is relatively cheap. 


Numerical Stability. While our algorithm is exact, our implementation uses 
floating-point arithmetic for efficiency. To evaluate the numerical stability, we 
compare the highest derivatives (solving Problem 3 for k = 1) with an empiri- 
cal approximation of the derivative obtained by perturbing the parameter by 
1 x 10°. The difference (column ‘Error. %’ in Table1 and |6, Appendix B, 
Table 2] between both is marginal, indicating that our implementation is suf- 
ficiently numerically stable to return accurate derivatives. 


Q3. Application in a Learning Framework 


Reducing the sample complexity is a key challenge in learning under uncer- 
tainty [43,47]. In particular, learning in stochastic environments is very data- 
intensive, and realistic applications tend to require millions of samples to provide 
tight bounds on measures of interest [16]. Motivated by this challenge, we apply 
our approach in a learning framework to investigate if derivatives can be used 
to effectively guide exploration, compared to alternative exploration strategies. 
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Steps (of 25 samples each) Steps (of 250 samples each) 


(a) Slippery grid world. (b) Drone motion planning. 

Fig. 7. Robust solutions for each sampling strategy in the learning framework for the 
grid world (a) and drone (b) benchmarks. Averages values of 10 (grid world) or 5 
(drone) repetitions are shown, with shaded areas the min/max. 


Models. We consider the problem of where to sample in 1) a slippery grid world 
with |S| = 800 and |V| = 100 terrain types, and 2) the drone benchmark from 
[23] with |S| = 4179 and |V| = 1053 parameters. As in the motivating example 
in Sect. 2, we learn a model of the unknown MC in the form of a prMC, where 
the parameters are the sample sizes for each parameter. We assume access to a 
model that can arbitrarily sample each parameter (i.e., the slipping probability 
in the case of the grid world). We use an initial sample size of N; = 100 for 
each parameter i € {1,...,|V|}, from which we infer a 3 = 0.9 (90%) confidence 
interval using Hoeffding’s inequality. The interval for parameter i is [p; — €i, Pi + 


ci], with p; the sample mean and e; = log 2—log (1—6) (see, e.g., [14] for details). 


Learning Scheme. We iteratively choose for which parameter v; € V to obtain 25 
(for the grid world) or 250 (for the drone) additional samples. We compare four 
strategies for choosing the parameter v; to sample for: 1) with highest derivative, 
i.e., solving Problem 3 for k = 1; 2) with biggest interval width €;; 3) uniformly; 
and 4) sampling according to the expected number of visits times the interval 
width (see [6, Appendix B.1] for details). After each step, we update the robust 
upper bound on the solution for the prMC with the additional samples. 


Results. The upper bounds on the solution for each sampling strategy, as well as 
the solution for the MC with the true parameter values, are shown in Fig. 7. For 
both benchmarks, our derivative-guided sampling strategy converges to the true 
solution faster than the other strategies. Notably, our derivative-guided strategy 
accounts for both the uncertainty and importance of each parameter, which leads 
to a lower sample complexity required to approach the true solution. 
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7 Related Work 


We discuss related work in three areas: pMCs, their extension to parametric 
interval Markov chains (piMCs), and general sensitivity analysis methods. 


Parametric Markov Chains. pMCs [24,45] have traditionally been studied in 
terms of computing the solution function [13,25,28,29,32]. Much recent litera- 
ture considers synthesis (find a parameter valuation such that a specification is 
satisfied) or verification (prove that all valuations satisfy a specification). We 
refer to [38] for a recent overview. For our paper, particularly relevant are [55], 
which checks whether a derivative is positive (for all parameter valuations), 
and [34], which solves parameter synthesis via gradient descent. We note that 
all these problems are (co-)ETR complete [41] and that the solution function 
is exponentially large in the number of parameters [7], whereas we consider a 
polynomial-time algorithm. Furthermore, practical verification procedures for 
uncontrollable parameters (as we do) are limited to less than 10 parameters. 
Parametric verification is used in [51] to guide model refinement by detecting 
for which parameter values a specification is satisfied. In contrast, we consider 
slightly more conservative rMCs and aim to stepwise optimize an objective. 
Solution functions also provide an approach to compute and refine confidence 
intervals [17]; however, the size of the solution function hampers scalability. 


Parametric interval Markov Chains (piMCs). While prMCs have, to the best of 
our knowledge, not been studied, their slightly more restricted version are piMCs. 
In particular, piMCs have interval-valued transitions with parametric bounds. 
Work on piMCs falls into two categories. First, consistency [27,50]: is there a 
parameter instantiation such that the (reachable fragment of the) induced inter- 
val MC contains valid probability distributions? Second, parameter synthesis for 
quantitative and qualitative reachability in piMCs with up to 12 parameters [10]. 


Perturbation Analysis. Perturbation analysis considers the change in solution 
by any perturbation vector X for the parameter instantiation, whose norm is 
upper bounded by ô, i.e., ||X|| < 6 (or conversely, which ô ensures the solu- 
tion perturbation is below a given maximum). Likewise, [21] uses the distance 
between two instantiations of a pMC (called augmented interval MC) to bound 
the change in reachability probability. Similar analyses exist for stationary dis- 
tributions [1]. These problems are closely related to the verification problem in 
pMCs and are equally (in)tractable if there are dependencies over multiple param- 
eters. To improve tractability, a follow-up [56] derives asymptotic bounds based on 
first or second-order Taylor expansions. Other approaches to perturbation analysis 
analyze individual paths of a system [18, 19,30]. Sensitivity analysis in (parameter- 
free) imprecise MCs, a variation to rMCs, is thoroughly studied in [22]. 
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Exploration in Learning. Similar to Q3 in Sect. 6, determining where to sample is 
relevant in many learning settings. Approaches such as probably approximately 
correct (PAC) statistical model checking [2,3] and model-based reinforcement 
learning [47] commonly use optimistic exploration policies [48]. By contrast, we 
guide exploration based on the sensitivity analysis of the solution function with 
respect to the parametric model. 


8 Concluding Remarks 


We have presented efficient methods to compute partial derivatives of the solu- 
tion functions for pMCs and prMCs. For both models, we have shown how to 
compute these derivatives explicitly for all parameters, as well as how to compute 
only the k highest derivatives. Our experiments have shown that we can compute 
derivatives for models with over a million states and thousands of parameters. 
In particular, computing the k highest derivatives yields significant speed-ups 
compared to computing all derivatives explicitly and is feasible for prMCs which 
can be verified. In the future, we want to support nondeterminism in the models 
and apply our methods in (online) learning frameworks, in particular for settings 
where reducing the uncertainty is computationally expensive [42,49]. 
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Abstract. Markov decision processes can be viewed as transformers 
of probability distributions. While this view is useful from a practical 
standpoint to reason about trajectories of distributions, basic reacha- 
bility and safety problems are known to be computationally intractable 
(i.e., Skolem-hard) to solve in such models. Further, we show that even 
for simple examples of MDPs, strategies for safety objectives over distri- 
butions can require infinite memory and randomization. 

In light of this, we present a novel overapproximation approach to 
synthesize strategies in an MDP, such that a safety objective over the 
distributions is met. More precisely, we develop a new framework for 
template-based synthesis of certificates as affine distributional and induc- 
tive invariants for safety objectives in MDPs. We provide two algorithms 
within this framework. One can only synthesize memoryless strategies, 
but has relative completeness guarantees, while the other can synthe- 
size general strategies. The runtime complexity of both algorithms is in 
PSPACE. We implement these algorithms and show that they can solve 
several non-trivial examples. 


Keywords: Markov decision processes - invariant synthesis - 
distribution transformers - Skolem hardness 


1 Introduction 


Markov decision processes (MDPs) are a classical model for probabilistic decision 
making systems. They extend the basic probabilistic model of Markov chains with 
non-determinism and are widely used across different domains and contexts. In the 
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verification community, MDPs are often viewed through an automata-theoretic 
lens, as state transformers, with runs being sequences of states with certain prob- 
ability for taking each run (see e.g., [9]). With this view, reachability probabili- 
ties can be computed using simple fixed point equations and model checking can 
be done over appropriately defined logics such as PCTL*. However, in several 
contexts such as modelling biochemical networks, queueing theory or probabilis- 
tic dynamical systems, it is more convenient to view MDPs as transformers of 
probability distributions over the states, and define objectives over these distri- 
butions [1,5,12,17,44,47]. In this framework, we can, for instance, easily reason 
about properties such as the probability in a set of states always being above a 
given threshold or comparing the probability in two states at some future time 
point. More concretely, in a chemical reaction network, we may require that the 
concentration of a particular complex is never above 10%. Such distribution-based 
properties cannot be expressed in PCTL* [12], and thus several orthogonal logics 
have been defined [1,12,44] that reason about distributions. 

Unfortunately, and perhaps surprisingly, when we view them as distribution 
transformers even the simplest reachability and safety problems with respect to 
probability distributions over states remain unsolved. The reason for this is a 
number-theoretical hardness result that lies at the core of these questions. In [3], 
it is shown that even with just Markov chains, reachability is as hard as the so- 
called SKOLEM problem, and safety is as hard as the POSITIVITY problem [55, 
56], the decidability of both of which are long-standing open problems in linear 
recurrence sequences. Moreover, synthesizing strategies that resolve the non- 
determinism in MDPs to achieve an objective (whether reachability or safety) 
is further complicated by the issue of how much memory can be allowed for the 
strategy. As we show in Sect.3, even for very simple examples, strategies for 
safety can require infinite memory as well as randomization. 

In light of these difficulties, what can one do to tackle these problems in 
theory and in practice? In this paper, we take an over-approximation route to 
approach these questions, not only to check existence of strategies for safety but 
also synthesize them. Inspired by the success of invariant synthesis in program 
verification, our goal is to develop a novel invariant-synthesis based approach 
towards strategy synthesis in MDPs, viewed as transformers of distributions. In 
this paper, we restrict our attention to a class of safety objectives on MDPs, 
which are already general enough to capture several interesting and natural 
problems on MDPs. Our contributions are the following: 


1. We define the notion of inductive distributional invariants for safety in MDPs. 
These are sets of probability distributions over states of the MDP, that (i) 
contain all possible distributions reachable from the initial distribution, under 
all strategies of an MDP, and (ii) are closed under taking the next step. 

2. We show that such invariants provide sound and complete certificates for 
proving safety objectives in MDPs. In doing so, we formalize the link between 
strategies and distributional invariants in MDPs. This by itself does not help 
us get effective algorithms in light of the hardness results above. Hence we 
then focus on synthesizing invariants of a particular shape. 


88 S. Akshay et al. 


3. We develop two algorithms for automated synthesis of affine inductive distri- 
butional invariants that prove safety in MDPs, and at the same time, synthe- 
size the associated strategies. 

— The first algorithm is restricted to synthesizing memoryless strategies 
but is relatively complete, i.e., whenever a memoryless strategy and an 
affine inductive distributional invariant that witness safety exist, we are 
guaranteed to find them. 

— The second algorithm can synthesize general strategies as well as memo- 
ryless strategies, but is incomplete in general. 

In both cases, we employ a template-based synthesis approach and reduce 
synthesis to the existential first-order theory of reals, which gives a PSPACE 
complexity upper bound. In the first case, this reduction depends on Farkas’ 
lemma. In the second case, we need to use Handelman’s theorem, a specialized 
result for strictly positive polynomials. 

4. We implement our approaches and show that for several practical and non- 
trivial examples, affine invariants suffice. Further, we demonstrate that our 
prototype tool can synthesize these invariants and associated strategies. 


Finally, we discuss the generalization of our approach from affine to polynomial 
invariants and some variants that our approach can handle. 


1.1 Related Work 


Distribution-based Safety Analysis in MDPs. The problem of checking 
distribution-based safety objectives for MDPs was defined in [5] but a solution 
was provided only in the uninitialized setting, where the initial distribution is not 
given and also under the assumption that the target set is closed and bounded. 
In contrast, we tackle both initialized and uninitialized settings, our target sets 
are general affine sets and we focus on actually synthesizing strategies not just 
proving existence. 


Template-based Program Analysis. Template-based synthesis via the means of 
linear/polynomial constraint solving is a standard approach in program analy- 
sis to synthesizing certificates for proving properties of programs. Many of these 
methods utilize Farkas’ lemma or Handelman’s theorem to automate the synthe- 
sis of program invariants [20,27], termination proofs [6,14,23, 28,57], reachabil- 
ity proofs [8] or cost bounds [16,39,64]. The works [2, 18,19, 21,22, 24,25, 62,63] 
utilize Farkas’ lemma or Handelman’s theorem to synthesize certificates for 
these properties in probabilistic programs. While our algorithms build on the 
ideas from the works on template-based inductive invariant synthesis in pro- 
grams [20,27], the key novelty of our algorithms is that they synthesize a fun- 
damentally different kind of invariants, i.e. distributional invariants in MDPs. 
In contrast, the existing works on (probabilistic) program analysis synthesize 
state invariants. Furthermore, our algorithms synthesize distributional invari- 
ants together with MDP strategies. While it is common in controller synthesis 


Invariant Synthesis for Affine Safety Objectives in MDPs 89 


to synthesize an MDP strategy for a state invariant, we are not aware of any 
previous work that uses template-based synthesis methods to compute MDP 
strategies for a distributional invariant. 


Other Approaches to Invariant Synthesis in Programs. Alternative approaches 
to invariant synthesis in programs have also been considered, for instance via 
abstract interpretation [29,30,33,60], counterexample guided invariant synthe- 
sis (CEGIS) [7,10,34], recurrence analysis [32,42,43] or learning [35,61]. While 
some of these approaches can be more scalable than constraint solving-based 
methods, they typically do not provide relative completeness guarantees. An 
interesting direction of future work would be to explore whether these alterna- 
tive approaches could be used for synthesizing distributional invariants together 
with MDP strategies more efficiently. 


Weakest Pre-expectation Calculus. Expectation transformers and the weakest 
pre-expectation calculus generalize Dijkstra’s weakest precondition calculus to 
the setting of probabilistic programs. Expectation transformers were introduced 
in the seminal work on probabilistic propositional dynamic logic (PPDL) [45] 
and were extended to the setting of probabilistic programs with non-determinism 
in [48,52]. Weakest pre-expectation calculus for reasoning about expected run- 
time of probabilistic programs was presented in [40]. Intuitively, given a function 
over probabilistic program outputs, the weakest pre-expectation calculus can be 
used to reason about the supremum or the infimum expected value of the func- 
tion upon executing the probabilistic program, where the supremum and the 
infimum are taken over the set of all possible schedulers (i.e. strategies) used to 
resolve non-determinism. When the function is the indicator function of some 
output set of states, this yields the method for reasoning about the probability of 
reaching the set of states. Thus, weakest pre-expectation calculus allows reason- 
ing about safety with respect to sets of states. In contrast, we are interested in 
reasoning about safety with respect to sets of probability distribution over states. 
Moreover, while the expressiveness of this calculus allows reasoning about very 
complex programs, its automation typically requires user input. In this work, we 
aim for a fully automated approach to checking distribution-based safety. 


2 Preliminaries 


In this section, we recall basics of probabilistic systems and set up our notation. 
We assume familiarity with the central ideas of measure and probability theory, 
see [13] for a comprehensive overview. We write [n] := {1,...,n} to denote the 
set of all natural numbers from 1 to n. For any set S, we use S to denote its 
complement. A probability distribution on a countable set X is a mapping p : 
X — (0, 1], such that $` ex u(x) = 1. Its support is denoted by supp(y) = {x € 
X | u(x) > O}. We write A(X) to denote the set of all probability distributions 
on X. An event happens almost surely (a.s.) if it happens with probability 1. 
We assume that countable sets of states S are equipped with an arbitrary but 
fixed numbering. 
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Fig. 1. Our running example MDP. It comprises three states S = { A, B, C}, depicted 
by rounded rectangles. In state A, there are two actions available, namely a and b. We 
have 6(A,a, A) = 1 and 6(A, b, B) = 1, indicated by arrows. States B and C have only 
one available action each, thus we omit explicitly labelling them. 


2.1 Markov Systems 


A (discrete time) Markov chain (MC) is a tuple M = (S, ô), where S is a finite 
set of states and 6: S — A(S) a transition function, assigning to each state a 
probability distribution over successor states. A Markov decision process (MDP) 
is a tuple M = (S, Act,d), where S is a finite set of states, Act is a finite 
set of actions, overloaded to yield for each state s the set of available actions 
Act(s) C Act, and 6: S x Act — A(S) is a transition function that for each 
state s and (available) action a € Act(s) yields a probability distribution over 
successor states. For readability, we write 6(s, s’) and 0(s, a, s’) instead of 6(s)(s’) 
and 6(s, a)(s’), respectively. By abuse of notation, we redefine S x Act := {(s,a) | 
s E€ S Na E Act(s)} to refer to the set of state-action pairs. See Fig. 1 for an 
example MDP. This MDP is our running example and we refer to it throughout 
this work to point out some of the peculiarities. 

An infinite path in an MC is an infinite sequence p = s152--- € S$”, such 
that for every i € N we have 6(s;, 5:41) > 0. A finite path o is a finite prefix 
of an infinite path. Analogously, infinite paths in MDP are infinite sequences 
p = $10182a2°-- E (S x Act)” such that a; E€ Act(s;) and 6(s;, ai, 8:41) > 0 for 
every 7 € N, and finite paths are finite prefixes thereof. We use p; and ọ; to refer 
to the i-th state in the given (in)finite path, and IPaths,;, and FPathsy, for the 
set of all (in)finite paths of a system M. 


Semantics. A Markov chain evolves by repeatedly applying the probabilistic 
transition function in each step. For example, if we start in state sı, we obtain the 
next state s2 by drawing a random state according to the probability distribution 
6(s1). Repeating this ad infinitum produces a random infinite path. Indeed, 
together with an initial state s, a Markov chain M induces a unique probability 
measure Pry,, over the (uncountable) set of infinite paths [9]. 

This reasoning can be lifted to distributions over states, as follows. Suppose 
we begin in pip = {s1 > 0.5, s2 + 0.5}, meaning that initially we are in state sı 
or s2 with probability 0.5 each. Then, j11(s’) = j0(s1)-0(81, 8’) + Mo(S2)-5(S2, 8’), 
i.e. the probability to be in a state s’ in the next step is 0.5 times the prob- 
ability of moving from sı and sə there, respectively. For an initial distribu- 
tion, we likewise obtain a probability distribution over infinite paths by setting 
PrmpolS] := do seg Hols) - Pru,s[S] for measurable S C IPathsm. 

In contrast to Markov chains, MDPs also feature non-determinism, which 
needs be resolved in order to obtain probabilistic behaviour. This is achieved 


Invariant Synthesis for Affine Safety Objectives in MDPs 91 


by (path) strategies, recipes to resolve non-determinism. Formally, a strategy 
on an MDP classically is defined as a function m : FPaths,y, — A(Act), 
which given a finite path 9 = saps a1...8, yields a probability distribu- 
tion m(e) € A(Act(s,)) on the actions to be taken next. We write I to 
denote the set of all strategies. Fixing any strategy m induces a Markov chain 
M” = (FPaths,,,67), where for a state o = soao ...Sn E€ FPaths,, the succes- 
sor distribution is defined as 57(0, 00n41Sn41) = 7(0,Gn41) ` O(Sn, @n41; Sn41)- 
(Note that the state space of this Markov chain in general is countably infinite.) 
Consequently, for each strategy m and initial distribution pọ we also obtain a 
unique probability measure Prm7 „o on the infinite paths of M. (Technically, 
the MC Mt” induces a probability measure over paths in M”, i.e. paths where 
each element is a finite path of M, however this can be directly projected to a 
measure over |Paths,,.) 

A one-step strategy (also known as memoryless or positional strategy) corre- 
sponds to a fixed choice in each state, independent of the history, i.e. a mapping 
a: S — A(Act). Fixing such a strategy induces a finite state Markov chain 
M” = (S,6"), where 6"(s,8’) = diac act(s) ™(8)(@) - (s, 4, 8"). We write IT, for 
the set of all one-step strategies. 

A sequence of one-step strategies (7;) € MY induces a general strategy which 
in each step 7 and state s chooses 7;(s). Observe that aside from the state, such 
a strategy only depends on the current step, also called Markov strategy. 


2.2 MODPs as Distribution Transformers 


Probabilistic systems typically are viewed as “random generators” for paths, and 
we consequently investigate the (expected) behaviour of a generated path, i.e. 
path properties. However, in this work we follow a different view, and treat 
systems as transformers of distributions. Formally, fix a Markov chain M. For a 
given initial distribution uo, we can define the distribution at step i by uils) = 
Praol{p € IPathsm | p; = s}]. We write u; = M(0, 2) for the i-th distribution and 
Lı = M(wo) for the “one-step” application of this transformation. Likewise, we 
obtain the same notion for an MDP M combined with a strategy 7, and write 
ui = M” (0,7), yı = M” (uo). In summary, for a given initial distribution, a 
Markov chain induces a unique stream of distributions, and an MDP provides 
one for each strategy. 

This naturally invites questions related to this induced stream of distribu- 
tions. In their path interpretation, queries such as reachability or safety, i.e. 
asking the probability of reaching or avoiding a set of states, allow for simple, 
polynomial time solutions [9,58]. However, the corresponding notions already 
are surprisingly difficult in the space of distributions. Thus, we restrict to the 
safety problem, which we introduce in the following. Intuitively, given a safe set 
of distributions over states H C A(S), we are interested in deciding whether 
the MDP can be controlled such that the stream of distributions always remains 
inside H. 


92 S. Akshay et al. 


3 Problem Statement and Examples 


Let M = (S, Act,6) be an MDP and H C A(S) be a safe set. A distribution 
Lo is called H-safe under n if M" (uo,i) € H for all i > 0, and H-safe if there 
exists a strategy under which po is safe. We mention two variants of the resulting 
decision problem as defined in [5]: 


— Initialized safety: Given an initial probability distribution 4o and safe set H, 
decide whether uo is H-safe. 

— Uninitialized safety: Given a safe set H, decide whether there exists a distri- 
bution u which is H-safe. 


Note that we have discussed neither the shape nor the representation of H, which 
naturally plays an important role for decidability and complexity. 

One may be tempted to think that the initialized variant is simpler, as more 
input is given. However, this problem is known to be Positiviry-hard! already 
for simple cases and already when H is defined in terms of rational constants! 


Theorem 1 ([3]). The initialized safety problem for Markov chains and H 
given as linear inequality constraint (H = {u | u(s) < r,s E€ S,r € QA (0, 1]}), 
is POSITIVITY-hard. 


Proof. In [3, Corollary 4], the authors show that the inequality version of the 
Markov reachability problem, i.e. deciding whether there exists an 7 such that 
uils) > r for a given rational r, is Positrviry-hard. The result follows by 
observing that safety is the negation of reachability. 


Thus, finding a decision procedure for this problem is unlikely, since it would 
answer several fundamental questions of number theory, see e.g. [41,55,56]. In 
contrast, the uninitialized problem is known to be decidable for safe sets H given 
as closed, convex polytopes (see [5] for details and [1] for a different approach 
specific to Markov chains). In a nutshell, we can restrict to the potential fixpoints 
of M, i.e. all distributions p such that u = M” (u,i) for some strategy 7. It 
turns out that this set of distributions is a polytope and the problem — glossing 
over subtleties — reduces to checking whether the intersection of H with this 
polytope is non-empty. However, we note that the solution of [5] does not yield 
the witness strategy. In the following, we thus primarily focus on the initialized 
question. In Sect. 6, we then show how our approach, which also synthesizes a 
witness strategy, is directly applicable to the uninitialized case. 

In light of the daunting hardness results for the general initialized problem, 
we restrict to affine linear safe sets, i.e. H which are specified by a finite set of 
affine linear inequalities. Formally, these sets are of the form H = {u € A(S) | 
Nala + 30", + u(si)) > 0}, where S = {s1,..., Sn}, Œ are real-valued 


1 Intuitively, the Posiriviry problem asks for a given rational (or integer or real) 
matrix M, whether (M"),1 > 0 for all n [54]. This problem (and its many variants) 
has been the subject of intense research over the last 10-15 years, see e.g. [55]. Yet, 
quite surprisingly, it still remains open in its full generality. 
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constants and N is the number of affine linear inequalities that define H. Our 
problem formally is given by the following query. 


Problem Statement Given an MDP M, initial distribution po, and affine 
linear safe set H, (i) decide whether uo is H-safe, and (ii) if yes, then syn- 
thesize a strategy for M which ensures safety. 


Note that the problem strictly subsumes the special case when H is defined in 
terms of rational constants, and our approach aims to solve both problems. Also, 
note that Theorem 1 still applies, i.e. this “simplified” problem is POSITIVITY- 
hard, too. We thus aim for a sound and relatively complete approach. Intuitively, 
this means that we restrict our search to a sub-space of possible solutions and 
within this space provide a complete answer. To give an intuition for the required 
reasoning, we provide an example safety query together with a manual proof. 


Example 1. Consider our running example from Fig. 1. Suppose the initial distri- 
bution is 4o = {A > 4, B > 4,C > $} and (affine linear) H = {u | u(C) > F}. 
This safety query is satisfiable, by, e.g., choosing action b, as we show in the 
following. First, observe that the i+ 1-th distribution is pj41(A) = 4 - p(C), 
hi+1ı(B) = mi(A), and mi1(C) = mi(B) + $ui(C). Thus, we cannot directly 
prove by induction that u;(C') > +, we also need some information about 1;(B) 
or j4;(A) to exclude, e.g., ui = {A > ł,C +> F}, where ju;+1 would violate 
the safety constraint. We invite the interested reader to try to prove that [U9 is 
indeed H-safe under the given strategy to appreciate the subtleties. 

We proceed by proving that p;(C) > 4 and additionally p;(A) < m(C) 
by induction. The base case follows immediately, thus suppose that p; satis- 
fies these constraints. For pj41(A) < i+1(C) observe that pi41(A) = $1i(C) 
and pi+1(C) = Sui(C) + wi(B). Since p;(B) > 0, the claim follows. To prove 
Hit1(C) > $ observe that (A) < 4 since m(A) < m(C) by induction hypoth- 
esis and distributions sum up to 1. Moreover, pi+ı(C) = m(B) + 4u:(C) = 
tu(B)+ 4 — $u;(A) by again inserting the fact that distributions sum up to 1. 
Then, p41(C) = 3 — 3ui(A) + 34(B) > 3 — gus(A) > 3-312 A 


Thus, already for rather simple examples the reasoning is non-trivial. To further 
complicate things, the structure of strategies can also be surprisingly complex: 


Example 2. Again consider our running example from Fig.1 with initial dis- 
tribution op = {A > ł,B +> 4} and safe set H = {u | (B) = 4}. This 
safety condition is indeed satisfiable, however the (unique) optimal strategy 
requires both infinite memory as well as randomization with arbitrarily small 
fractions! In step 1, we require choosing a with 2 and b with i to satisfy the 
safety constraint in the second step, getting yı = {A = 4, B = 4,0 > F}. 
For step 2, we require choosing both a and b with probability $ each, yielding 
pig = {A' 3B = ł,C = 3}. Continuing this strategy, we obtain at step i 
that ui = {Ar t zm, B H iC = 5 sit} and action a is chosen with 
probability 1/(2*~1 + 1), converging to 1. A 
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In the following, we provide two algorithms that handle both examples. Our first 
algorithm focusses on memoryless strategies, the second considers a certain type 
of infinite memory strategies. Essentially, the underlying idea is to automatically 
synthesize a strategy together with such inductive proofs of safety. 


4 Proving Safety by Invariants 


We now discuss our principled idea of proving safety by means of (inductive) 
invariants, taking inspiration from research on safety analysis in programs [20, 
27|. We first show that considering strategies which are purely based on the 
current distribution over states are sufficient. Then, we show that inductive 
invariants are a sound and complete certificate for safety. Together, we obtain 
that an initial distribution is H-safe if and only if there exists an invariant set 
I and distribution strategy m such that (i) the initial distribution is contained 
in I, (ii) J is a subset of the safe set H, and (iii) Z is inductive under 7, i.e. if 
u € I then M” (u) € I. In the following section, we then show how we search 
for invariants and distribution strategies of a particular shape. 


4.1 Distribution Strategies 


We show that distribution strategies m : A(S) — I, yielding for each distribu- 
tion over states a one-step strategy to take next, are sufficient for the problem 
at hand. More formally, we want to show that an H-safe distribution strategy 
exists if and only if there exists any H-safe strategy. 

First, observe that distribution strategies are a special case of regular path 
strategies. In particular, for any given initial distribution, we obtain a uniquely 
determined stream of distributions as fj. = M™*)(1;), ie. the distribution 
[441 is obtained by applying the one-step strategy m(ui) to ui. In turn, this 
lets us define the Markov strategy 7;(s) = m(ss:)(s). For simplicity, we identify 
distribution strategies with their induced path strategy. 

Next, we argue that restricting to distribution strategies is sufficient. 


Theorem 2. An initial distribution uo is H-safe if and only if there exists a 
distribution strategy m such that uo is H-safe under r. 


Proof (Sketch). The full proof can be found in [4, Sec. 4.1]. Intuitively, only the 
“distribution” behaviour of a strategy is relevant and we can sufficiently replicate 
the behaviour of any safe strategy by a distribution strategy. 


In this way, each MDP corresponds to a (uncountably infinite) transition 
system Zm = (A(S),T) where (, yu’) € T if there exists a one-step strategy 
m such that p’ = M" (u). Note that Tm is a purely non-deterministic system, 
without any probabilistic behaviour. So, our decision problem is equivalent to 
asking whether the induced transition system Zm can be controlled in a safe 
way. Note that Zm is uncountably large and uncountably branching. 
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4.2 Distributional Invariants for MDP Safety 


We now define distributional invariants in MDPs and show that they provide 
sound and complete certificates for proving initialized (and uninitialized) safety. 


Distributional Invariants in MDPs. Intuitively, a distributional invariant is a 
set of probability distributions over MDP states that contains all probability 
distributions that can arise from applying a strategy to an initial probability 
distribution, i.e. the complete stream p;i. Hence, similar to the safe set H, dis- 
tributional invariants are also defined to be subsets of A(S). 


Definition 1 (Distributional Invariants). Let uo E€ A(S) be a probability 
distribution over S and m be a strategy in M. A set I C A(S) is said to be a 
distributional invariant for uo under m if the sequence of probability distributions 
induced by applying the strategy m to the initial probability distribution po is 
contained in I, i.e. if M” (uo,2) € I for each i> 0. 

A distributional invariant I is said to be inductive under 7, if we furthermore 
have that MT (u) € I holds for any p € I, i.e. if I is “closed” under application 
of M” to any probability distribution contained in I. 


Soundness and Completeness for MDP Safety. The following theorem shows 
that, in order to solve the initialized (and uninitialized) safety problem, one can 
equivalently search for a distributional invariant that is fully contained in H. 
Furthermore, it shows that one can without loss of generality restrict the search 
to inductive distributional invariants. 


Theorem 3 (Sound and Complete Certificate). Let uo E A(S) be a prob- 
ability distribution over S, m be a strategy in M, and H C A(S) be a safe set. 
Then uo is H-safe under x if and only if there exists an inductive distributional 
invariant I for po and m such that IC H. 


The proof can be found in [4, Sec. 4.2]. 

Thus, in order to solve the initialized safety problem for uo, it suffices to 
search for (i) a strategy a and (ii) an inductive distributional invariant I for po 
and m such that J C H. On the other hand, in order to solve the uninitialized 
safety problem, it suffices to search for (i) an initial probability distribution ju, 
(ii) strategy 7, and (iii) an inductive distributional invariant I for ju9 and m such 
that I C H. In the following, we provide a fully automated, sound and relatively 
complete method of deciding the existence of such an invariant and strategy. 


5 Algorithms for Distributional Invariant Synthesis 


We now present two algorithms for automated synthesis of strategies and induc- 
tive distributional invariants towards solving distribution safety problems in 
MDPs. The two algorithms differ in the kind of strategies they consider and, 
as a consequence of differences in the involved expressions, also in their com- 
pleteness guarantees. For readability, we describe the algorithms in their basic 


96 S. Akshay et al. 


form applied to the initialized variant of the safety problem and discuss further 
extensions in Sect.6. In particular, our approach is also directly applicable to 
the uninitialized variant, as we describe there. 

We say that an inductive distributional invariant is affine if it can be speci- 
fied in terms of (non-strict) affine inequalities, which we formalize below. Both 
algorithms jointly synthesize a strategy and an affine inductive distributional 
invariant by employing a template-based synthesis approach. In particular, they 
fix symbolic templates for each object that needs to be synthesized, encode 
the defining properties of each object as constraints over unknown template 
variables, and solve the system of constraints by reduction to the existential 
first-order theory of the reals. 

For example, a template for an affine linear constraint on distributions A(S) 
is given by aff(w) = (co +c) - (51) +--+ + cn: W(Sn) > 0). Here, the variables co 
to Cn, written in grey for emphasis, are the template variables. For fixed values 
of these variables the expression aff is a concrete affine linear predicate over 
distributions. Thus, we can ask questions like “Do there exist values for c; such 
that for all distributions u we have that aff(j:) implies aff(M7())?”. This is 
a sentence in the theory of reals — however with quantifier alternation. As a 
next step, template-based synthesis approaches then employ various quantifier 
elimination techniques to convert such expressions into equisatisfiable sentences 
in, e.g., the existential theory of reals, which is decidable in PSPACE [15]. 


Difference between the Algorithms. Our two algorithms differ in their appli- 
cability and the kind of completeness guarantees that they provide. In terms 
of applicability, the first algorithm only considers memoryless strategies, while 
the second algorithm searches for distribution strategies specified as fractions 
of affine linear expressions. (We discuss an extension to rational functions in 
Sect. 6.) In terms of completeness guarantees, the first algorithm is (relatively) 
complete in the sense that it is guaranteed to compute a memoryless strategy 
and an affine inductive distributional invariant that prove safety whenever they 
exist. In contrast, the second algorithm does not provide the same level of com- 
pleteness. 


Notation. In what follows, we write = to denote (syntactic) equivalence of expres- 
sions, to distinguish from relational symbols used inside these expressions, such 
as “=”. For example @(x) = x = 0 means that (x) is the predicate x = 0. 
Moreover, (21,...,2n) denotes a symbolic probability distribution over the state 
space S = (s1,...,8,), where x; is a symbolic variable that encodes the prob- 
ability of the system being in s;. We use boldface notation # = (21,...,%n) 
to denote the vector of symbolic variables. Thus, the above example would be 
written aff(x) = co + c1 - £1 +++: + Cn ` £n > 0. Since we often require vectors 
to represent a distribution, we write æ € A(S) as abbreviation for the predicate 
NE O< a: 2 DA (02: =D. 


Algorithm Input and Assumptions. Both algorithms take as input an MDP M = 
(S, Act, 6) with S = {s1,..., Sn}. They also take as input a safe set H C A(S). 
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We assume that H is specified by a boolean predicate over n variables as a logical 
conjunction of Ny € No affine inequalities, and that it has the form 


H(a) = (æ € A(S)) A A" (hi(@) > 0), 


where the first term imposes that x is a probability distribution over S$ and 
h(x) = hi + hi -xı +--+ + hi -£n is an affine expression over x with real- 
valued coefficients hi for each i € [Ny] and j € {0,...,n}. (Note that h4 are not 
template variables but fixed values, given as input.) Next, the algorithms take 
as input an initial probability distribution po € A(S). Finally, the algorithms 
also take as input technical parameters. Intuitively, these describe the size of 
used symbolic templates, explained later. For the remainder of the section, fix 
an initialized safety problem, i.e. an M, safe set H of the required form, and an 
initial distribution po. 


5.1 Synthesis of Affine Invariants and Memoryless Strategies 


We start by presenting our first algorithm, which synthesizes memoryless strate- 
gies and affine inductive distributional invariants. We refer to this algorithm as 
AlgMemLess. The algorithm proceeds in the following four steps: 


1. Setting up Templates. The algorithm fixes symbolic templates for the memo- 
ryless strategy m and the affine inductive distributional invariant J. Note that 
the values of the symbolic template variables at this step are unknown and 
are to be computed in subsequent steps. 

2. Constraint Collection. The algorithm collects the constraints which encode 
that 7 is a (memoryless) strategy, that I contains the initial probability dis- 
tribution jig, that J is an inductive distributional invariant with respect to 
m and uo, and that J is contained within H. This step yields a system of 
affine constraints over symbolic template variables that contain universal and 
existential quantifiers. 

3. Quantifier Elimination. The algorithm eliminates universal quantifiers from 
the above constraints to reduce it to a system of purely existentially quanti- 
fied system of polynomial constraints over the symbolic template variables. 
Concretely, the first algorithm achieves this by application of Farkas’ lemma. 

4. Constraint Solving. The algorithm solves the resulting system of constraints 
by using an off-the-shelf solver to compute concrete values for symbolic tem- 
plate variables specifying the strategy a and invariant I. 


We now describe each step in detail. 


Step 1: Setting up Templates. The algorithm sets templates for m and I as 
follows: 


— Since this algorithm searches for memoryless strategies, the probability of 
taking an action a; in state s; is always the same, independent of the current 
distribution. Hence, our template for 7 consists of a symbolic template vari- 
able ps,,a; for each s; € S, a; € Act(s;). We write ps, .o = (Ps;,01)-++1Psi,am) 
to refer to the corresponding distribution in state s,. 
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— The template of I is given by a boolean predicate specified by a conjunction 
of Nr affine inequalities, where N; is the template size and is an algorithm 
parameter. In particular, the template of J looks as follows: 


Nr, : : 
I(x) = (x€ ASAN (ag aya +: + an’ En > 0). 
The first predicate enforces that I only contains vectors that define probability 
distributions over S. 


Step 2: Constraint Collection. We now collect the constraints over symbolic 
template variables which encode that m is a memoryless strategy, that J contains 
the initial distribution jo, that J is an inductive distributional invariant under 
a, and that J is contained in H. 


— For 7 to be a strategy, we only need to ensure that each ps; is a probability 
distribution over the set of available actions at every state s;. Thus, we set 


PDstrat = K (Ds;,0 E A(Act(s;))) s 


— For I to be a distributional invariant for 7 and pọ as well as to be inductive, 
it suffices to enforce that I contains uo and that J is closed under application 
of m. Thus, we collect two constraints: 


Nie i 1 i n 
Pinitial = I(Mo) = N, (0 + ai: fig +...a, +o 20), and 
inductive = (Var € R”. I(x) => I(step(æ))) ’ 


where step(æ)(z;) = DRE Psp ay ` Ô(Sk, Qj, Si): xj yields the distri- 
bution after applying one step of the strategy induced by Pstrat to x. 
— For I to be contained in H, we enforce the constraint: 


Pate = (Vx € R”. I(x) => H(x)). 


Step 8: Quantifier Elimination. Constraints Pstrat and Pinitiay are purely exis- 
tentially quantified over symbolic template variables, thus we can solve them 
directly. However, inductive and safe contain both universal and existential 
quantifiers, which are difficult to handle. In what follows, we show how the 
algorithm translates these constraints into equisatisfiable purely existentially 
quantified constraints. In particular, our translation exploits the fact that both 
Pinductive and Psafe can, upon splitting the conjunctions on the right-hand side 
of implications into conjunctions of implications, be expressed as conjunctions 
of constraints of the form 


Va € R”. (affexp,(a@) > 0) A--- A (affexpy, (a) > 0) = (affexp(a) > 0). 


Here, each affexp;(x) and affexp(ax) is an affine expression over x whose affine 
coefficients are either concrete real values or symbolic template variables. 
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In particular, we use Farkas’ lemma [31] to remove universal quantification 
and translate the constraint into an equisatisfiable existentially quantified system 
of constraints over the symbolic template variables, as well as fresh auxiliary 
variables that are introduced by the translation. For completeness, we briefly 
recall (a strengthened and adapted version of) Farkas’ lemma. 


Lemma 1 ([31,37]). Let ¥ = {1,..., £n} be a finite set of real-valued vari- 
ables, and consider the following system of N € N affine inequalities over X: 


ci +c sit tehan >0 
p: 


A +N -ritt a, > 0 


Suppose that ® is satisfiable. Then P entails an affine inequality 6 = co +c1: £1 + 

---+Cn' Zn, i.e. P => ¢, if and only if ọ can be written as a non-negative linear 

combination of affine inequalities in ®, i.e. if and only if there exist y1, ..., Yn = 0 
N j N A 

such that c1 = X j1 Yj Cis +s Cn = jan Yj eh 


Note that, for any implication appearing in Pinguctive and Bsafe, the system 
of constraints on the left-hand side is simply I(a), and the satisfiability of I(a) 
is enforced by ®initia Hence, we may apply Farkas lemma to translate each 
constraint with universal quantification into an equivalent purely existentially 
quantified constraint. In particular, for any constraint of the form 


Va € R”. (affexp, (a) > 0) A--- A (affexpy (x) > 0) => (affexp(a) > 0), 


we introduce fresh template variables y1, ..., yy and translate it into the system 
of purely existentially quantified constraints 


(yı > O)A-+-A (yn > 0) A (affexp(x) =p y1-affexp, (a) +--+ + yn -affexpy(x)). 


Here, we use affexp(x) =p yı - affexp,(x) + --- + yn - affexpy(a) to denote 
the set of n + 1 equalities over the symbolic template variable and y1,..., YN 
which equate the constant coefficients as well as the linear coefficients of each 
xi on two sides of the equivalence, i.e. exactly those equalities which we obtain 
from applying Farkas’ lemma. We highlight that the expressions affexp are only 
affine linear for fixed existentially quantified variables, i.e. they are in general 
quadratic. 


Step 4: Constraint Solving. Finally, we feed the resulting system of existentially 
quantified polynomial constraints over the symbolic template variables as well as 
the auxiliary variables introduced by applying Farkas’ lemma to an off-the-shelf 
constraint solver. If the solver outputs a solution, we conclude that the computed 
invariant J is an inductive distributional invariant for the strategy m and initial 
distribution jg, and that J is contained in H. Therefore, by Theorem 3, we 
conclude that uo is H-safe under 7. 
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Binit : cot ci: 3 +02-¢+03-5 20 
Beate (Cot c1-A+c-B+eo3-C>0) = C> iF 
Pinductive : ( F A gi Hane 
ares (A +4C)+c2-A +c- (B+4C)>0 
Bstrat : PA, 20 PAs 20 PA ar På a =1 


Fig. 2. List of constraints generated in Step 2 for Example 1 with Nr = 1. The 
uppercase letters correspond to variables indicating the distribution in these states, 
i.e. A refers to u(A). These also are the universally quantified variables, which will be 
handled by the quantifier elimination in Step 3. The template variables are written in 
grey. For readability, we omit the constraints required for state distributions u € A(S), 
i.e. A > 0 etc. The actual query sent to the solver in Step 4 after quantifier elimination 
comprises 27 constraints with 21 variables. 


Theorem 4. Soundness: Suppose AlgMemLess returns a memoryless strategy m 
and an affine inductive distributional invariant I. Then, po is H-safe under r. 

Completeness: If there exist a memoryless strategy n and an affine inductive 
distributional invariant I such that I C H and uo is H-safe under n, then there 
exists a minimal value of the template size Nz E€ N such that x and I are produced 
by AlgMemLess. 

Complexity: The runtime of AlgMemLess is in PSPACE in the size of the 
MDP, the encoding of the safe set H and the template size parameter Nr € N. 


The proof can be found in [4, Sec. 5.1]. We comment on the PSPACE upper 
bound on the complexity of AlgMemLess. The upper bound holds since the appli- 
cation of Farkas’ lemma reduces synthesis to solving a sentence in the existential 
first-order theory of the reals and since the size of the sentence is polynomial in 
the sizes of the MDP, the encoding of the safe set H and the invariant template 
size N;. However, it is unclear whether the resulting constraints could be solved 
more efficiently, and the best known upper bound on the time complexity of 
algorithms for template-based affine inductive invariant synthesis in programs is 
also PSPACE [8,27]. Designing more efficient algorithms for solving constraints 
of this form would lead to better algorithms both for the safety problem stud- 
ied in this work and for template-based affine inductive invariant synthesis in 
programs. 


Example 3. For completeness, we provide the constraints generated in Step 2 
for Example 1 with N; = 1 for readability, i.e. our running example Fig. 1 with 
Wo = {AH },B = 4,0 |> §} and H = {p| u(C) > §}, in Fig. 2. 


To conclude this section, we emphasize that our algorithm simultaneously 
synthesizes both the invariant and the witnessing strategy, which is the key 
component to achieve relative completeness. 
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5.2 Synthesis of Affine Invariants and General Strategies 


We now present our second algorithm, which additionally synthesizes distribution 
strategies (of a particular shape) together with an affine inductive distributional 
invariant. We refer to it as AlgDist. The second algorithm proceeds in the anal- 
ogous four steps as the first algorithm, AlgMemLess. Hence, in the interest of 
space, we only discuss the differences compared to AlgMemLess. 


Step 1: Setting up Templates. The algorithm sets up templates for 7 and J. The 
template for I is defined analogously as in Sect. 5.1. However, as we now want 
to search for a strategy 7 that need not be memoryless but instead may depend 
on the current distribution, we need to consider a more general template. In 
particular, the template for the probability ps,,a; of taking an action a; in state 
s; is no longer a constant value. Instead, ps, a; (£) is a function of the probability 
distribution æ of the current state of the MDP, and we define its template to be 
a quotient of two affine expressions for each s; € S and a; E€ Act(s;): 


num(si,a;)(#) _ rgd + rit -ryte + rid ay 
den(s;)(a) = si tsi -xit H st - an ` 


Ps;,aj (a) = 


(In Sect. 6, we discuss how to extend our approach to polynomial expressions for 
numerator and denominator, i.e. rational functions.) Note that the coefficients 
in the numerator depend both on the state s; and the action aj, whereas the 
coefficients in the denominator depend only on the state s;. This is because we 
only use the affine expression in the denominator as a normalization factor to 
ensure that Ps, a; indeed defines a probability. 


Step 2: Constraint Collection. As before, the algorithm now collects the con- 
straints over symbolic template variables which encode that 7 is a strategy, that 
I is an inductive distributional invariant, and that J is contained in H. The 
constraints initial, Pinductive, and Bsafe are defined analogously as in Sect. 5.1, 
with the necessary adaptation to step(x). For the strategy constraint strat we 
now need to take additional care to ensure that each quotient template defined 
above does not induce division by 0 and that these values indeed correspond 
to a distribution over the available actions. We ensure this by the following 
constraint: 


Na csacsi aj) (a) 20A 
Pstrat = VB € R”. M(x) => N, | den(s:)(2) > 1A 
L crap s aj)(æ) = den(s;)(a). 


The first two constraints ensure that all quantities are positive and we never 
divide by 0. The third means that the numerators sum up to the denominator. 
Together, this ensures the desired result, i.e. Ps; o (£) € A(Act(s;)) whenever 
x € A(S). Note that the > 1 constraint for the denominator can be replaced by 
an arbitrary constant > 0, since we can always rescale all involved coefficients. 
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Step 3: Quantifier Elimination. The constraints ®strat, Pinitial, and Psafe can be 
handled analogously to Sect.5.1. In particular, by applying Farkas’ lemma these 
can be translated into an equisatisfiable purely existentially quantified system 
of polynomial constraints, and our algorithm applies this translation. 

However, the constraint ®inductive NOW involves quotients of affine expres- 
sions: Upon splitting the conjunction on the right-hand side of the implication 
in Pinductive into a conjunction of implications, the inequalities on the right-hand 
side of these implications contain templates for strategy probabilities ps, q(x). 
The algorithm removes the quotients by multiplying both sides of the inequal- 
ity by denominators of each quotient. (Recall that each denominator is positive 
by the constraint Pstrat-) This results in the multiplication of symbolic affine 
expressions, hence Pinductive becomes a conjunction of implications of the form 


Va € R”. (affexp, (a) > 0) A--- A (affexpy(x) > 0) = > (polyexp(x) > 0). 


Here, each affexp;(a) is an affine expression over x, but polyexp(x) is now a 
polynomial expression over æ. Hence we cannot apply a Farkas’ lemma-style 
result to remove universal quantifiers. 

Instead, we motivate our translation by recalling Handelman’s theorem [38], 
which characterizes strictly positive polynomials over a set of affine inequalities. 
It will allow us to soundly translate ®inguctive into an existentially quantified sys- 
tem of constraints over the symbolic template variables, as well as fresh auxiliary 
variables that are introduced by the translation. 


Theorem 5 ((38]). Let ¥ = {x1,..., £n} be a finite set of real-valued variables, 
and consider the following system of N € N non-strict affine inequalities over 
X: 
Laal 1 
Co + Ci: Ti +i H Cn Ena 
p: : 
Q +N- rito +c an > 0 


Let Prod(®) = {] Ji; pi | t € No, ¢i € P} be the set of all products of finitely 
many affine expressions in B, where the product of 0 affine expressions is a 
constant expression 1. Suppose that © is satisfiable and that {y | y = B}, the 
set of values satisfying B, is topologically compact, i.e. closed and bounded. Then 
® entails a polynomial inequality p(x) > 0 if and only if ọ can be written as 
a non-negative linear combination of finitely many products in Prod(®), i.e. if 
and only if there exist yi,...,Yn > 0 and ¢1,...,¢n E Prod(®) such that 6 = 


Yi dr t:+++ Yn: On. 


Notice that we cannot directly apply Handelman’s theorem to a constraint 
Va € R”. (affexp,(x) > 0) A--- A (affexp,y (x) > 0) => (polyexp(a) > 0), 


since the polynomial inequality on the right-hand-side of the implication is non- 
strict whereas the polynomial inequality in Handelman’s theorem is strict. How- 
ever, the direction needed for the soundness of translation holds even with the 
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non-strict polynomial inequality on the right-hand side. In particular, it clearly 
holds that if polyexp can be written as a non-negative linear combination of 
finitely many products of affine inequalities, then polyexp is non-negative when- 
ever all affine inequalities are non-negative. Hence, we may use the translation 
in Handelman’s theorem to translate each implication in inductive into a system 
of purely existentially quantified constraints. 

As Handelman’s theorem does not impose a bound on the number of products 
of affine expressions that might appear in the translation, we parametrize the 
algorithm with an upper bound K on the maximal number of affine inequalities 
appearing in each product. To that end, we define Prod () = {J [i ¢: | 0< 
t < K, di € D}. Let Mpg = |Prodx(®)| be the total number of such products 
and Prodg(®) = {¢1,...,¢u,}. Then, for any constraint of the form 


Va € R”. (affexp,(x) > 0) A--- A (affexpy (x) > 0) => (polyexp(x) > 0), 


we introduce fresh template variables y1,..., yaa, and translate it into the sys- 
tem of purely existentially quantified constraints 


(yı = 0) A++: A (yn 2 0) A (polyexp(x) =n y1: b1(@) + +++ + Ymr ` OM (2)). 


Here, polyexp(a) =p y1-¢1(@) +--+: +ymx OM (£) denotes the set of equalities 
over template variables and y1,..., yi, Which equate the constant coefficients 
as well as the coefficients of each monomial over {x1,..., £k} of degree at most 
K on two sides of the equivalence, as specified by Handelman’s theorem. 

While our translation into a purely existentially quantified constraints is not 
complete due to the non-strict polynomial inequality and due to the parametriza- 
tion by K, Handelman’s theorem justifies the translation as it indicates that the 
translation is “close to complete” for sufficiently large values of K. 


Step 4: Constraint Solving. This step is analogous to Sect. 5.1 and we use an off- 
the-shelf polynomial constraint solver to handle the resulting system of purely 
existentially quantified polynomial constraints. If the solver outputs a solution, 
we conclude that the computed J is an inductive distributional invariant for the 
computed strategy m and initial distribution uo, and that I is contained in H. 
Therefore, by Theorem 3, we conclude that uo is H-safe under m. 


Theorem 6. Soundness: Suppose AlgDist returns a strategy m and an affine 
inductive distributional invariant I. Then, n is H-safe for mo. 

Complexity: For any fixed parameter K € N, the runtime of AlgDist is in 
PSPACE in the size of the MDP and the template size parameter Nr € N. 


The proof can be found in [4, Sec. 5.2]. 


6 Discussion, Extensions, and Variants 


With our two algorithms in place, we remark on several interesting details and 
possibilities for extensions. 
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Polynomial Expressions. Our second algorithm can also be extended to synthe- 
sizing polynomial inductive distributional invariants, i.e. instead of defining the 
invariant J through a conjunction of affine linear expressions we could synthesize 
polynomial expressions such as £? + 22-23 < 0.5. This can be achieved by using 
Putinar’s Positivstellensatz [59] instead of Handelman’s theorem in Step 3. This 
technique has recently been used for generating polynomial inductive invariants 
in programs in [20], and our translation in Step 3 can be analogously adapted to 
synthesize polynomial inductive distributional invariants up to a specified degree. 
In the same way, instead of requiring that H is given as a conjunction of affine 
linear constraints, we can also handle the case of polynomial constraints. The 
same holds true for the probabilities of choosing certain actions ps,,a,(a). While 
we have defined these as fractions of affine linear expressions, we could replace 
them with rational functions, which we chose to exclude for sake of readability. 


Uninitialized and Restricted Initial Case. We remark that we can directly incor- 
porate the uninitialized case in our algorithm. In particular, instead of requiring 
that I(uo) holds for the concretely given initial values, we can instead exis- 
tentially quantify over the values of no(s) and add the constraint that po is 
a distribution, i.e. o(s) € A(S). This does not add universal quantification, 
thus we do not need to apply any quantifier elimination for these variables. This 
also subsumes and generalizes the ideas of [5], which observes that checking 
whether a fixpoint of the transition dynamics lies within H is sufficient. Choos- 
ing I = {u*} where u* is such a fixpoint satisfies all of our constraints. See [4, 
Sec. 6] for details. 

Our algorithm is also able to handle the “intermediate” case, as follows. The 
uninitialized case leaves absolute freedom in the choice of initial distribution, 
while the initialized case concretely specifies one initial distribution. Here, we 
could as well impose some constraints on the initial distribution without fixing it 
completely, i.e. ask whether there exists an H-safe initial distribution uo which 
satisfies a predicate Binit- If Binit is a conjunction of affine linear constraints, we 
can directly handle this query, too. Note that both initialized and uninitialized 
are special cases thereof. 


Non-Inductive Initial Steps. Instead of requiring to synthesize an invariant which 
contains the initial distribution, we can explicitly write down the first k dis- 
tributions and only then require an invariant and strategy to be found. More 
concretely, the set of distributions that can be achieved in a given step k while 
remaining in H can be explicitly computed, denote this set as A*. For a different 
perspective, this describes the set of states reachable in Zm within k steps and 
corresponds to “unrolling” the MDP for a fixed number of steps. This then goes 
hand in hand with the above “restricted initial case”, where we ask whether there 
exists an H-safe distribution in A*. We conjecture that this could simplify the 
search for distributional invariants for systems which have a lot of “transient” 
behaviour, as observed in searching for invariants for state reachability [11]. 
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Fig. 3. Our Split toy example. The MDP comprises two disconnected parts. Probability 
mass flows from A to B and from C to D under all strategies. 


7 Implementation and Evaluation 


While the main focus of our contribution lies on the theory, we validate the appli- 
cability through an unoptimized prototype implementation. We implemented 
our approach in Python 3.10, using SymPy 1.11 [50] to handle and simplify sym- 
bolic expressions, and PySMT 0.9 [36] to abstract communication with constraint 
solvers. We use z3 4.8 [53] and mathsat 5.6 [26] as back-ends. Our experiments 
were executed on consumer hardware (AMD Ryzen 3600 CPU with 16 GB 
RAM). 


Caveats. While the existential (non-linear) theory of the reals is known to be 
decidable, practical algorithms are less explored than, for example, SAT solving. 
In particular, runtimes are quite sensitive to minor changes in the input struc- 
ture and initial randomization (many solvers apply randomized algorithms). We 
observed differences of several orders of magnitude (going from seconds to hours) 
simply due to restarting the computation (leading to different initial seeds). Sim- 
ilarly, by strengthening the antecedents of implications by known facts, we also 
observed significant improvements. Concretely, given that we have constraints of 
the form I(x) => H(a) and I(x) => (a), we observed that changing the sec- 
ond constraint to I(a) A H(a) => (x) would drastically improve the runtime 
even though the two are semantically equivalent. 

This suggests that both improvements of our implementation as well as fur- 
ther work on constraint solvers are likely to have a significant impact on the 
runtime. 


Models. Aside from our running example of Fig. 1, which we refer to as Running 
here, we consider two further toy examples. 

The first model, called Chain, is a Markov chain defined as follows: We con- 
sider the states S = {s1,..., S10} and set 6(s;) = {si41 > 1} for all i < 10 and 
5(810) = {89 > $, s10 > 4}. The initial distribution is given as ji9(s;) = 7 for 
all s; € S'S and the safe set by H = {u(s10) > 4}. We are mainly interested in 
this model to investigate demonstrate applicability to “larger” systems. 

The second model, called Split, is an MDP which actually comprises two 
independent subsystems. We depict the model in Fig. 3. The initial distribution 
is 4o = {A > $,C > $} and the safe set H = {u(A) + u(D) > 5}. This aims to 
explore both disconnected models as well as a safe set which imposes a constraint 
on multiple states at once. In particular, observe that initially wo(D) = 0 but 
i(D) converges to 1 while u;(A) converges to 0, even if choosing action a1. 
Thus, the invariant needs to identify the simultaneous flow from A to B and C 
to D. 
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Table 1. Overview of our results for the five considered models. From left to right, 
we list the name of the model, the runtime, and size of the invariant, followed by the 
number of variables, constraints, and total size of the query passed to the constraint 
solvers. For Running, we provided additional hints to the solver to achieve a more 
consistent runtime, indicated by the dagger symbol. 


Model Runtime | N; | #Var. | #Constr. | Size. 
Running 3st 3 | 92 123 849 
Chain 10s 2 |69 82 666 
Split 3s 3 | 60 69 571 
PageRank | 3s 2 | 44 52 536 
Insulin-'?"I | 2s 2 | 44 52 476 


Table 2. The invariants and strategies computed for our models. We omit the invari- 
ants for the two real-world scenarios since they are too large to fit. 


Model | Computed Invariant and Strategy 


F A= 
Running |{A>4,B=t} mu) ={a rme n OS} 
Chain {s9 + s10 > E, S10 > at m = @ (Markov chain) 


Split |{B<D,A+B>C+D,3-(C+D)—-(A+B)>1} r={am1} 


We additionally consider two examples from the literature, namely the 
PageRank example from |1, Fig. 3], based on [51], and Insulin-!°'I, a pharma- 
cokinetics system |1, Example 2], based on [17]. Both are Markov chains. 


Results. We summarize our findings briefly in Table 1. We again underline that 
not too much attention should be put on runtimes, since they are very sensitive 
to minimal changes in the model. The evaluation is mainly intended to demon- 
strate that our methods are actually able to provide results. For completeness, 
we report the size of the invariant Ny and the size of the constraint problem 
in terms of number of variables, constraints, and operations inside these con- 
straints. We also provide the invariants and strategy identified by our method 
in Table 2. Note that for Running we used AlgDist, while the other two examples 
are handled by AlgMemLess. For Running, we observed a significant dependence 
on the initialization of the solvers. Thus we added several “hints”, i.e. known 
correct values for some variables. (To be precise, we set the value for eight of the 
92 variables.) 


Discussion. We remark two related points: Firstly, we observe that very often 
most of the involved auxiliary variables introduced by the quantifier elimination 
have a value of zero. Thus, a potential optimization is to explicitly set most such 
variables to zero, check whether the formula is satisfiable, and, if not, gradually 
remove these constraints either at random or guided by unsat-cores if available 
(i.e. clauses which are the “reason” for unsatisfiability). Moreover, we observed 
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significant differences between the solvers: While z3 seems to be much quicker 
to identify unsatisfiability, mathsat usually is better at finding satisfying assign- 
ments. Hence, using both solvers in tandem seems to be very beneficial. 


8 Conclusion 


We developed a framework for defining certificates for safety objectives in 
MDPs as distributional inductive invariants. Using this, we came up with two 
algorithms that synthesize linear /affine invariants and corresponding memory- 
less/general strategies for safety in MDPs. To the best of our knowledge this is 
the first time the template-based invariant approach, already known to be suc- 
cessful for programs, has been applied to synthesis strategies in MDPs for distri- 
butional safety properties. Our experimental results show that affine invariants 
are sufficient for many interesting examples. However, the second approach can 
be lifted to synthesize polynomial invariants, and hence potentially, a large set of 
MDPs. Exploring this could be a future line of work. It would also be interesting 
to explore how one can automate distributional invariant synthesis if the safe set 
H is specified in terms of both strict and non-strict inequalities. Finally, in terms 
of applicability, we would like to apply this approach to solve more benchmarks 
and problems, e.g., to synthesize risk-aware strategies for MDPs [46,49]. 
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Abstract. This paper marries two state-of-the-art controller synthesis 
methods for partially observable Markov decision processes (POMDPs), a 
prominent model in sequential decision making under uncertainty. A cen- 
tral issue is to find a POMDP controller—that solely decides based on the 
observations seen so far—to achieve a total expected reward objective. As 
finding optimal controllers is undecidable, we concentrate on synthesising 
good finite-state controllers (FSCs). We do so by tightly integrating two 
modern, orthogonal methods for POMDP controller synthesis: a belief- 
based and an inductive approach. The former method obtains an FSC 
from a finite fragment of the so-called belief MDP, an MDP that keeps 
track of the probabilities of equally observable POMDP states. The latter 
is an inductive search technique over a set of FSCs, e.g., controllers with a 
fixed memory size. The key result of this paper is a symbiotic anytime algo- 
rithm that tightly integrates both approaches such that each profits from 
the controllers constructed by the other. Experimental results indicate a 
substantial improvement in the value of the controllers while significantly 
reducing the synthesis time and memory footprint. 


Reusable 


1 Introduction 


A formidable synthesis challenge is to find a decision-making policy that satis- 
fies temporal constraints even in the presence of stochastic noise. Markov deci- 
sion processes (MDPs) [26] are a prominent model to reason about such poli- 
cies under stochastic uncertainty. The underlying decision problems are efficiently 
solvable and probabilistic model checkers such as PRISM [22] and SToRM [13] are 
well-equipped to synthesise policies that provably (and optimally) satisfy a given 
specification. However, a major shortcoming of MDPs is the assumption that the 
policy can depend on the precise state of a system. This assumption is unreal- 
istic whenever the state of the system is only observable via sensors. Partially 
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Fig. 1. Schematic depiction of the symbiotic approach 


observable MDPs (POMDPs) overcome this shortcoming, but policy synthesis for 
POMDPs and specifications such as the probability to reach the exit is larger than 
50% requires solving undecidable problems [23]. Nevertheless, in recent years, a 
variety of approaches have been successfully applied to a variety of challenging 
benchmarks, but the approaches also fail somewhat spectacularly on seemingly 
tiny problem instances. From a user perspective, it is hard to pick the right app- 
roach without detailed knowledge of the underlying methods. This paper sets out 
to develop a framework in which conceptually orthogonal approaches symbioti- 
cally alleviate each other’s weaknesses and find policies that maximise, e.g., the 
expected reward before a target is reached. We show empirically that the com- 
bined approach can find compact policies achieving a significantly higher reward 
than the policies that either individual approach constructs. 


Belief Exploration. Several approaches for solving POMDPs use the notion of 
beliefs [27]. The key idea is that each sequence of observations and actions induces 
a belief—a distribution over POMDP states that reflects the probability to be in 
a state conditioned on the observations. POMDP policies can decide optimally 
solely based on the belief. The evolution of beliefs can be captured by a fully 
observable, yet possibly infinite belief MDP. A practical approach (see the lower 
part of Fig.1) is to unfold a finite fragment of this belief MDP and make its 
frontier absorbing. This finite fragment can be analysed with off-the-shelf MDP 
model checkers. Its accuracy can be improved by using an arbitrary but fixed 
cut-off policy from the frontier onwards. Crucially, the probability to reach the 
target under such a policy can be efficiently pre-computed for all beliefs. This 
paper considers the belief exploration method from [8] realised in STORM [13]. 


Policy Search. An orthogonal approach searches a (finite) space of policies [14, 
24] and evaluates these policies by verifying the induced Markov chain. To ensure 
scalability, sets of policies must be efficiently analysed. However, policy spaces 
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explode whenever they require memory. The open challenge is to adequately 
define the space of policies to search in. In this paper, we consider the policy- 
search method from [5] as implemented in PAYNT [6] that explores spaces of 
finite-state controllers (FSCs), represented as deterministic Mealy machines [2], 
using a combination of abstraction-refinement, counterexamples (to prune sets 
of policies), and increasing a controller’s memory, see the upper part of Fig. 1. 


Our Symbiotic Approach. In essence, our idea relies on the fact that a policy 
found via one approach can boost the other approach. The key observation is that 
such a policy is beneficial even when it is sub-optimal in terms of the objective at 
hand. Figure 1 sketches the symbiotic approach. The FSCs Fz obtained by policy 
search are used to guide the partial belief MDP to the target. Vice versa, the 
FSCs Fg obtained from belief exploration are used to shrinken the set of policies 
and to steer the abstraction. Our experimental evaluation, using a large set of 
POMDP benchmarks, reveals that (a) belief exploration can yield better FSCs 
(sometimes also faster) using FSCs Fz from Paynt—even if the latter FSCs 
are far from optimal, (b) policy search can find much better FSCs when using 
FSCs from belief exploration, and (c) the FSCs from the symbiotic approach are 
superior in value to the ones obtained by the standalone approaches. 


Beyond Exploration and Policy Search. In this work, we focus on two power- 
ful orthogonal methods from the set of belief-based and search-based methods. 
Alternatives exist. Exploration can also be done using a fixed set of beliefs [25]. 
Prominently, HSVI [18] and SARSOP [20] are belief-based policy synthesis 
approaches typically used for discounted properties. They also support undis- 
counted properties, but represent policies with a-vectors. Bounded policy synthe- 
sis [29] uses a combination of belief-exploration and inductive synthesis over paths 
and addresses finite horizon reachability. a-vector policies lead to more complex 
analysis downstream: the resulting policies must track the belief and do floating- 
point computations to select actions. For policy search, prominent alternatives 
are to search for randomised controllers via gradient descent [17] or via convex 
optimization [1,12,19]. Alternatively, FSCs can be extracted via deep reinforce- 
ment learning [9]. However, randomised policies limit predictability, which ham- 
pers testing and explainability. The area of programmatic reinforcement learn- 
ing [28] combines inductive synthesis ideas with RL. While our empirical evalua- 
tion is method-specific, the lessons carry over to integrating other methods. 


Contributions. The key contribution of this paper is the symbiosis of belief 
exploration [8] and policy search [5]. Though this seems natural, various tech- 
nical obstacles had to be addressed, e.g., obtaining Fg from the finite fragment 
of the belief MDP and the policies for its frontier and developing an interplay 
between the exploration and search phases that minimises the overhead. The 
benefits of the symbiotic algorithm are manifold, as we show by a thorough 
empirical evaluation. It can solve POMDPs that cannot be tackled with either 
of the two approaches alone. It outputs FSCs that are superior in value (with 
relative improvements of up to 40%) as well as FSCs that are more succinct 
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(with reduction of a factor of up to two orders of magnitude) with only a small 
penalty in their values. Additionally, the integration reduces the memory foot- 
print compared to belief exploration by a factor of 4. In conclusion, the proposed 
symbiosis offers a powerful push-button, anytime synthesis algorithm producing, 
in the given time, superior and/or more succinct FSCs compared to the state- 
of-the-art methods. 


F [+p steps] 


Fig. 2. (a) and (b) contain two POMDPs. Colours encode observations. Unlabelled 
transitions have probability 1. Omitted actions (e.g. y,6 in state B2) execute a self- 
loop. (c) Markov chain induced by the minimising policy øg in the finite abstraction 
MB of the POMDP from Fig. 2a. In the rightmost state, policy F is applied (cut-off), 
allowing to reach the target in p steps. (Color figure online) 


2 Motivating Examples 


We give asample POMDP that is hard for the belief exploration, a POMDP that 
challenges the policy search approach, and indicate why a symbiotic approach 
overcomes this. A third sample POMDP is shown to be unsolvable by either 
approach alone but can be treated by the symbiotic one. 


A Challenging POMDP for Belief-Based Exploration. Consider POMDP 
Ma in Fig. 2a. The objective is to minimise the expected number of steps to the 
target Tą. An optimal policy is to always take action a yielding 4 expected steps. 
An FSC realising this policy can be found by a policy search under 1s. 


Belief MDPs. States in the belief MDP ME are beliefs, probability distributions 
over POMDP states with equal observations. The initial belief is {S + 1}. By 
taking action a, ‘yellow’ is observed and the belief becomes {L +> $, Re a}. 
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Closer inspection shows that the set of reachable beliefs is infinite rendering M8 
to be infinite. Belief exploration constructs a finite fragment MB by exploring 
MBE up to some depth while cutting off the frontier states. From cut-off states, 
a shortcut is taken directly to the target. These shortcuts are heuristic over- 
approximations of the true number of expected steps from the cut-off state to 
the target. The finite MDP MB® can be analysed using off-the-shelf tools yielding 
the minimising policy og assigning to each belief state the optimal action. 


Admissible Heuristics. A simple way to over-approximate the minimal number 
of the expected number of steps to the target is to use an arbitrary controller 
F and use the expected number of steps under F. The latter is cheap if F is 
compact, as detailed in Sect. 4.2. Figure 2c shows a Markov chain induced by 
og in MB, where the belief {L > Z, R = a} is cut off using F. The belief 
exploration in STORM [8] unfolds 1000 states of MË and finds controller F that 
uniformly randomises over all actions in the rightmost state. The resulting sub- 
optimal controller Fg reaches the target in ~4.1 steps. Exploring only a few 
states suffices when replacing F by a (not necessarily optimal) FSC provided by 
a policy search. 


A Challenging POMDP for Policy Search. Consider POMDP M, in 
Fig. 2b. The objective is to minimise the expected number of steps to Tj. Its 
9-state belief MDP MB is trivial for the belief-based method. Its optimal con- 
troller og first picks action y; on observing ‘yellow’ it plays @ twice, otherwise it 
always picks a. This is realised by an FSC with 3 memory states. The inductive 
policy search in PAYNT [5] explores families of FSCs of increasing complexity, 
i.e., of increasing memory size. It finds the optimal FSC after consulting about 
20 billion candidate policies. This requires 545 model-checking queries; the opti- 
mal one is found after 105 queries while the remaining queries prove that no 
better 3-state FSC exists. 


Reference Policies. The policy search is guided by a reference policy, in this 
case the fully observable MDP policy that picks (senseless) action ô in B; first. 
Using policy og—obtained by the belief method—instead, ô is never taken. As 
og picks in each ‘blue’ state a different action, mimicking this requires at least 
three memory states. Using og reduces the total number of required model- 
checking queries by a factor of ten; the optimal 3-state FSC is found after 23 
queries. 


The Potential of Symbiosis. To further exemplify the limitation of the two 
approaches and the potential of their symbiosis, we consider a synthetic POMDP, 
called Lanes+, combining a Lane model with larger variants of the POMDPs in 
Fig. 2; see Table2 on page 14 for the model statistics and Appendix C of [3] 
for the model description. We consider minimisation of the expected number of 
steps and a 15-min timeout. The belief-based approach by STORM yields the 
value 18870. The policy search method by PAYNT finds an FSC with 2 memory 
states achieving the value 8223. This sub-optimal FSC significantly improves the 


118 R. Andriushchenko et al. 


belief MDP approximation and enables STORM to find an FSC with value 6471. 
The symbiotic synthesis loop finds the optimal FSC with value 4805. 


3 Preliminaries and Problem Statement 


A (discrete) distribution over a countable set A is a function u: A — [0,1] 
s.t. >, (a) = 1. The set supp(s) := {a € A | u(a) > 0} is the support of u. The 
set Distr(A) contains all distributions over A. We use Iverson bracket notation, 
where [x] = 1 if the Boolean expression x evaluates to true and [a] = 0 otherwise. 


Definition 1 (MDP). A Markov decision process (MDP) is a tuple M = 
(S, 80, Act, P) with a countable set S of states, an initial state so € S, a finite 
set Act of actions, and a partial transition function P: S x Act » Distr(S). 
Act(s) := {a € Act | P(s,a) 4 L} denotes the set of actions available in state 
s E€ S. An MDP with |Act(s)| = 1 for each s € S is a Markov chain (MC). 


Unless stated otherwise, we assume Act(s) = Act for each s € S for conciseness. 
We denote P(s, a, s’) := P(s,a)(s’). A (finite) path of an MDP M is a sequence 
T = 50005101 - - -Sn where P(si, Qi, 8:41) > 0 for 0 <i < n. We use last(z) to 
denote the last state of path m. Let Paths™ denote the set of all finite paths 
of M. State s is absorbing if supp(P(s, a)) = {s} for all a € Act. 


Definition 2 (POMDP). A partially observable MDP (POMDP) is a tuple 
M = (M, Z,O), where M is the underlying MDP, Z is a finite set of observations 
and O: S > Z is a (deterministic) observation function. 


For POMDP M with underlying MDP M, an observation trace of path 7 = 
$0Q981Q1...S, is a sequence O(7) := O(s9)aoO(s1)a1...O(Sn). Every MDP 
can be interpreted as a POMDP with Z = S and O(s) = s for all s € S. 

A (deterministic) policy is a function o: Paths” — Act. Policy ø is memo- 
ryless if last(7) = last(n’) => o(m) = a(n’) for all 7,7’ € Paths”. A memo- 
ryless policy o maps a state s € S to action o(s). Policy o is observation-based 
if O(m) = O(7’) o(m) = o(n’) for all 7,7’ € Paths“. For POMDPs, we 
always consider observation-based policies. We denote by 2/5, the set of all 
observation-based policies. A policy o € Xos induces the MC M7. 

We consider indefinite-horizon reachability or expected total reward prop- 
erties. Formally, let M = (S,s9,Act,P) be an MC, and let T C S be a set 
of target states. P™ [s / OT] denotes the probability of reaching T from state 
s € S. We use P™ [OT] to denote P™ [so H OT] and omit the superscript if 
the MC is clear from context. Now assume POMDP M with underlying MDP 
M = (S,80,Act,P), and a set T C S of absorbing target states. Without 
loss of generality, we assume that the target states are associated with the 
unique observation zT € Z, i.e. s € T iff O(s) = zT. For a POMDP M and 
T C S, the mazimal reachability probability of T for state s € S in M is 
PM [s H OT] = super, PM [s H OT]. The minimal reachability probabil- 
ity PM, [s H= OT] is defined analogously. 


Finite-state controllers are automata that compactly encode policies. 
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Definition 3 (FSC). A finite-state controller (FSC) is a tuple F = 
(N,70,7,6), with a finite set N of nodes, the initial node no € N, the action 
function y: N x Z — Act and the update function 6: Nx Zx ZN. 


A k-FSC is an FSC with |N| = k. If k=1, the FSC encodes a memoryless policy. 
We use FM (FM) to denote the family of all (k-)FSCs for POMDP M. For a 
POMDP in state s, an agent receives observation z = O(s). An agent following 
an FSC F executes action a = y(n, z) associated with the current node n and 
the current (prior) observation z. The POMDP state is updated accordingly to 
some s’ with P(s,a,s’) > 0. Based on the next (posterior) observation z = 
O(s'), the FSC evolves to node n’ = d(n, z, 2’). The induced MC for FSC F is 
MF = (S x N,(s0,n0), {a}, PF), where for all (s, n), (s’,n’) € S x N we have 


p7 ((s, n), Q, (s', n')) = [n =6 (n, O(s), O(s'))] i P(s, y(n, O(s)), s’). 


We emphasise that for MDPs with infinite state space and POMDPs, an 
FSC realising the maximal reachability probability generally does not exist. 
For FSC F € FM with the set N of memory nodes, let PM” [(s,n) H OT] := 
pM" [(s,n) = O(L x N)] denote the probability of reaching target states T from 
state (s,n) € S x N. Analogously, PM" OT] = PM" (0(T x N)] denotes the 
probability of reaching target states T in the MC M”? induced on M by F. 


Problem Statement. The classical synthesis problem [23] for POMDPs asks: 
given POMDP M, a set T of targets, and a threshold A, find an FSC F such that 
pM* [OT] > A, if one exists. We take a more practical stance and aim instead to 


optimise the value PM” [OT] in an anytime fashion: the faster we can find FSCs 
with a high value, the better. 


Remark 1. Variants of the maximising synthesis problem for the expected total 
reward and minimisation are defined analogously. For conciseness, in this paper, 
we always assume that we want to maximise the value. 


In addition to the value of the FSC F, another key characteristic of the controller 
is its size, which we treat as a secondary objective and discuss in detail in Sect. 6. 


4 FSCs for and from Belief Exploration 


We consider belief exploration as described in [8]. A schematic overview is given 
in the lower part of Fig. 1. We recap the key concepts of belief exploration. This 
section explains two contributions: we discuss how arbitrary FSCs are included 
and present an approach to export the associated POMDP policies as FSCs. 


4.1 Belief Exploration with Explicit FSC Construction 


Finite-state controllers for a POMDP can be obtained by analysing the (fully 
observable) belief MDP [27]. The state space of this MDP consists of beliefs: 
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probability distributions over states of the POMDP M having the same obser- 
vation. Let S, := {s € S | O(s) = z} denote the set of all states of M with 
observation z € Z. Let the set of all beliefs Bm = Uez Distr(S,) and denote 
for b € Bm by O(b) € Z the unique observation O(s) of any s € supp(b). 

In a belief b, taking action a yields an updated belief as follows: let 
P(b, a, 2’) = Debate b(s)-dises., P(S,a, 8’) denote the probability of observ- 
ing z’ € Z upon taking action a € Act in belief b € By. If P(b,a,z’) > 0, 
the corresponding successor belief b = [bla,z’] with O(b') = z’ is defined 
component-wise as 


No!) -— seso b b(s) : P(s,a, 8’) 
Bla,2 Ks) = PO. a, 2’) 


for all s’ € Sy. Otherwise, [b|a, z’] is undefined. 


Definition 4 (Belief MDP). The belief MDP of POMDP M is the MDP 
MË = (Bm, bo, Act, PË), with initial belief bo := {so + 1} and transition func- 
tion P®(b, a,b’) = [b = [bla, z']] -P(b, a, 2’) where z' = O(0’). 


The belief MDP captures the behaviour of its POMDP. It can be unfolded by 
starting in the initial belief and computing all successor beliefs. 


Deriving FSCs from Finite Belief MDPs. Let TË := {b € By | O(b) = zt 
denote the set of target beliefs. If the reachable state space of the belief MDP M 
is finite, e.g. because the POMDP is acyclic, standard model checking techniques 
can be applied to compute the memoryless policy og: Bm — Act that selects 
in each belief state b € By, the action that maximises P [b = Orel. We can 
translate the deterministic, memoryless policy og into the corresponding FSC 
Fg = (Bm, b0,7,6) with action function y(b, z) = og(b) and update function 
6(b, z, 2’) = [blog(b), 2’] for all z, 2’ € Z.? 


Handling Large and Infinite Belief MDPs. In case the reachable state space of 
the belief MDP >» MB is infinite or too large for a complete unfolding, a finite 
approximation M5 is used instead [8]. Assuming M” is unfolded up to some 
depth, let E C Bm denote the set of explored beliefs and let U C Bm\E denote 
the frontier: the set of unexplored beliefs reachable from € in one step. To 
complete the finite abstraction, we require handling of the frontier beliefs. The 
idea is to use for each b € U a cut-off value V(b): an under-approximation of the 
maximal reachability probability pM” [b = OT R] for b in the belief MDP. We 


max 
explain how to compute cut-off values systematically given an FSC in Sect. 4.2. 


Ultimately, we define a finite MDP MË = (EUU U{b7, b1 }, bo, Act, P8) with 
the transition function: P8(b, a) := PË (b,a) for explored beliefs b € € and all 
a € Act, and P8(b, a) := {br + V(b), b, + 1 — V(b)} for frontier beliefs b € U 
and all a € Act, where b+ and b, are fresh sink states, i.e. P8 (bt, a) := {bt + 1} 


t Memoryless policies suffice to maximise the value in a fully observable MDP [26]. 
? The assignments of missing combinations where z 4 O(b) are irrelevant. 
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and P3(b,,a) := {b1 + 1} for all a € Act. The reachable state space of M8 
is finite, enabling its automated analysis; since our method to compute cut-off 
values emulates an FSC, a policy maximising PM" [O(T8 U {b+})] induces an 


FSC for the original POMDP M. We discuss how to obtain this FSC in Sect. 4.3. 


4.2 Using FSCs for Cut-Off Values 


A crucial aspect when applying the belief exploration with cut-offs is the choice 
of suitable cut-off values. The closer the cut-off value is to the actual optimum 
in a belief, the better the approximation we obtain. In particular, if the cut-off 
values coincide with the optimal value, cutting off the initial state is optimal. 
However, finding optimal values is as hard as solving the original POMDP. We 
consider under-approzimative value functions induced by applying any? FSC to 
the POMDP and lifting the results to the belief MDP. The better the FSC, the 
better the cut-off value. We generalise belief exploration with cut-offs such that 
the approach supports arbitrary sets of FSCs with additional flexibility. 

Let Fz € F™ be an arbitrary, but fixed FSC for POMDP M. Let Pon = 
pM” [(s,n) | OT] for state (s,n) € S x N in the corresponding induced MC. 
For fixed n € N, V(b,n) = Z seso) b(s) - Psn denotes the cut-off value for 
belief b and memory node n. It corresponds to the probability of reaching a 
target state in M*2 when starting in memory node n € N and state s € S 
according to the probability distribution b. We define the overall cut-off value 
for b induced by F as V(b) := maxnen V(b,n). It follows straightforwardly 
that V(b) < pM? [b = OT®]. As values ps n only need to be computed once, 
computing V(b) for a given belief b is relatively simple. However, the complexity 
of the FSC-based cut-off approach depends on the size of the induced MC. 
Therefore, it is essential that the FSCs used to compute cut-off values are concise. 


4.3 Extracting FSC from Belief Exploration 


Model checking the finite approximation MDP M” with cut-off values induced 
by an FSC Fz yields a maximising memoryless policy og. Our goal is to represent 
this policy as an FSC Fg. We construct Fg by considering both Fz and the 
necessary memory nodes for each explored belief b € E. Concretely, for each 
explored belief, we introduce a corresponding memory node. In each such node, 
the action og(b) is selected. For the memory update, we distinguish between two 
cases based on the next belief after executing og(b) in MP. If for observation 
z’ € Z, the successor belief b = [b|og(b), z’] € E, the memory is updated to 
the corresponding node. Otherwise, b € U holds, i.e., the successor is part of 
the frontier. The memory is then updated to the memory node n of FSC Fz 
that maximises the cut-off value V(b’, n). This corresponds to the notion that 
if the frontier is encountered, we switch from acting according to policy og to 
following Fz (initialised in the correct memory node). This is formalised as: 


3 We remark that [8] considers memoryless FSCs only. 
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Definition 5 (Belief-based FSC with cut-offs). Let Fz = (N,no,7z,6z) 
and MB as before. The belief-based FSC with cut-offs is Fg = (EU N, bo, y, ô) 
with action function 7(b, z) = og(b) forb € E and y(n, z) = yr(n,z) forn €e N 
and arbitrary z E€ Z. The update function 6 is defined for all z,z' € Z by 
b(n, z, 2’) = ôz(n,z,z') ifn € N, and forb € E with b' = [blog(b),z'] by: 


6(b,z,2/) =U ifb EE, and ô(b, z, 2’) = argmax,<yV(b',n) otherwise. 


5 Accelerated Inductive Synthesis 


In this section, we consider inductive synthesis [5], an approach for finding con- 
trollers for POMDPSs in a set of FSCs. We briefly recap the main idea, then first 
explain how to use a reference policy. Finally, we introduce and discuss a novel 
search space for the controllers that we consider in this paper in detail. 


5.1 Inductive Synthesis with k-FSCs 


In the scope of this paper, inductive synthesis [4] considers a finite family of 
FSCs FM of k-FSCs with memory nodes N = {no,...,nz—1}, and the family 
MF = {MF | F € FM} of associated induced MCs. The states for each 
MC are tuples (s,n) € S x N. For conciseness, we only discuss the abstraction- 
refinement framework [10] within the inductive synthesis loop. The overall image 
is as in Fig. 1. Informally, the MDP abstraction of the family M7 £“ of MCs is an 
MDP MDP(F™) with the set S'x N of states such that, if some MC M € MFE” 
executes action a in state (s,n) € S x N, then this action (with the same 
effect) is also enabled in state (s,n) of MDP(FM). Essentially, MDP(FM) over- 
approximates the behaviour of all the MCs in the family M? E^: it simulates an 
arbitrary family member in every step, but it may switch between steps.* 


Definition 6. MDP abstraction for POMDP M and family FM = {Fi,..., 
Fm} of k-FSCs is the MDP MDP(FM) := (S x N,(so,no),{1,---,m}, PFE) 
with 

PFe((s,n),i) = P”. 


While this MDP has m actions, practically, many actions coincide. Below, we 
see how to utilise the structure of the FSCs. Here, we finish by observing that 
the MDP is a proper abstraction: 


M 
Lemma 1. /10] For all F € FM, PMOPF ED OT] < PMT OT] < 
P ETOT]. 
With that result, we can naturally start with the set of all k-FSCs and search 
through this family by selecting suitable subsets [10]. Since the number k of 
memory nodes necessary is not known in advance, one can iteratively explore 
the sequence FM, FM,... of families of FSCs of increasing complexity. 


+ The MDP is an game-based abstraction [21] of the all-in-one MC [11]. 
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5.2 Using Reference Policies to Accelerate Inductive Synthesis 


Consider the synthesis process of the optimal k-FSC F € FM for POMDP 
M. To accelerate the search for F within this family, we consider a reference 
policy, e.g., a policy og extracted from an (approximation of the) belief MDP, 
and shrink the FSC family. For each observation z € Z, we collect the set 
Act|og](z) = {op(b) | b € Bm, O(b) = z} of actions that were selected by og in 
beliefs with observation z. The set Act|ag](z) contains the actions used by the 
reference policy when in observation z. We focus the search on these actions by 
constructing a subset of FSCs { (N,no,7,6) € FM | Vn € N,z € Z.7(n,z) € 
Act|og](z)}. 

Restricting the action selection may exclude the optimal k-FSC. It also does 
not guarantee that the optimal FSC in the restricted family achieves the same 
value as the reference policy og as og may have more memory nodes. We first 
search the restricted space of FSCs before searching the complete space. This 
also accelerates the search: The earlier a good policy is found, the easier it is to 
discard other candidates (because they are provably not optimal). Furthermore, 
in case the algorithm terminates earlier (notice the anytime aspect of our problem 
statement), we are more likely to have found a reasonable policy. 


Fig. 3. (a) A POMDP where colours and capital letters encode observations; unlabelled 
transitions have probability 1/2; omitted actions (e.g. action £ in the initial state) are 
self-loops; the objective is to minimise the expected number of steps to reach state G. 
(b) The optimal posterior-aware 2-FSC. (Color figure online) 


Additionally, we could use sets Act|og] to determine with which k to search. 
If in some observation z € Z the belief policy og uses |Act[og](z)| distinct 
actions, then in order to enable the use of all of these actions, we require at least 
k = maxzez |Act|og](z)| memory states. However, this may lead to families that 
are too large and thus we use a more refined view discussed below. 


5.3 Inductive Synthesis with Adequate FSCs 


In this section, we discuss the set of candidate FSCs in more detail. In particular, 
we take a more refined look at the families that we consider. 
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More Granular FSCs. We consider memory models [5] that describe per- 
observation how much memory may be used: 


Definition 7 (u-FSC). A memory model for POMDP M is a function p: Z > 
N. Let k = maxzez p(z). The k-FSC F € FM with nodes N = {no,...,nx—-1} 
is a -FSC iff for all z € Z and for alli > (z) it holds: y(n;, z) = y(no, z) and 
5(ni, z, 2’) = (no, z, 2’) for any z € Z. 


F denotes the family of all u-FSCs. Essentially, memory model u dictates 
that for prior observation z only u(z) memory nodes are utilised, while the rest 
behave exactly as the default memory node no. Using memory model u with 
u(z) < k for some observations z € Z greatly reduces the number of candidate 
controllers. For example, if |S] = 1 for some z € Z, then upon reaching this 
state, the history becomes irrelevant. It is thus sufficient to set u(z) = 1 (for 
the specifications in this paper). It also significantly reduces the size of the 
abstraction, see Appendix A of [3]. 


Posterior-aware or Posterior-unaware. The technique outlined in [5] considers 
posterior-unaware FSCs [2]. An FSC with update function ô is posterior-unaware 
if the posterior observation is not taken into account when updating the memory 
node of the FSC, i.e. O(n, z, 2’) = (n, z, 2”) for all n € N,z,2’,2” € Z. This 
restriction reduces the policy space and thus the MDP abstraction MDP(FM). 
On the other hand, general (posterior-aware) FSCs can utilise information about 
the next observation to make an informed decision about the next memory node. 
As a result, fewer memory nodes are needed to encode complex policies. Consider 
Fig. 3a which depicts a simple POMDP. First, notice that in yellow states Y; we 
want to be able to execute two different actions, implying that we need at least 


Algorithm 1: Anytime algorithm 


Input : POMDP M, set T of target states, timeout values t, tz, tg 
Output: Best FSCs Fz and Fg found so far 


1Frol,FoFM FOO ua {zm 1|z€E Z}, Fg- l, op L 

2 while not timeout t do 

3 while not timeout tz do 

4 if F = ý then 

5 k—k+1 

6 Vz € Z: p(z) — max{p(z), k} 

7 FHF 

8 F, Fr — search(F, Fr, Act[og] if PMT [OT] > PME [OT] else L) 
9 os, Fg — explore(tg, Fr) 
10 | if POT] < PM [OT] and Jz € Z: p(z) <|Act[og](z)| then 
11 Vz € Z: (z) — |Act[on](z)| 
12 FFM 


13 yield Fr, Fg 
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two memory nodes to distinguish between the two states, and the same is true for 
the blue states B;. Second, notice that in each state the visible action always leads 
to states having different observations, implying that the posterior observation 
z’ is crucial for the optimal decision making. If z’ is ignored, it is impossible 
to optimally update the memory node. Figure 3b depicts the optimal posterior- 
aware 2-FSC allowing to reach the target within 12 steps on expectation. The 
optimal posterior-unaware FSC has at least 4 memory nodes and the optimal 
posterior-unaware 2-FSC uses 14 steps. 


MDP Abstraction. To efficiently and precisely create and analyse MDP abstrac- 
tions, Definition 6 is overly simplified. In Appendix A of [3], we present the 
construction for general, posterior-aware FSCs including memory models. 


6 Integrating Belief Exploration with Inductive Synthesis 


We clarify the symbiotic approach from Fig. 1 and review FSC sizes. 


Symbiosis by Closing the Loop. Section4 shows the potential to improve 
belief exploration using FSCs, e.g., obtained from an inductive synthesis loop, 
whereas Sect. 5 shows the potential to improve inductive synthesis using policies 
from, e.g., belief exploration. A natural next step is to use improved inductive 
synthesis for belief exploration and improved belief exploration for inductive 
synthesis, i.e., to alternate between both techniques. This section briefly clarifies 
the symbiotic approach from Fig. 1 using Algorithm 1. 


Table 1. Sizes of different types of FSCs. 


FSC class size(y) size(d) 
k-FSC k-|Z| 2 omen deez |post(n, z)| 
p-FSC Deez hl) 2: Deez DEY [post(ni, 2)| 


posterior-unaware -FSC | X ez u(z) reg Mz) 
Fg using Fz for cut-offs | size(yz) + |E| | size(6z) + 2- Soy ce |post(b, O(d))| 


We iterate until a global timeout t: in each iteration, we make both controllers 
available to the user as soon as they are computed (Algorithm 1, 1. 13). We start 
in the inductive mode (1. 3-8), where we initially consider the 1-FSCs represented 
in FM. Method search (l. 8) investigates F and outputs the new maximising 
FSC Fz (if it exists). If the timeout tz interrupts the synthesis process, the 
method additionally returns yet unexplored parameter assignments. If F is fully 
explored within the timeout tz (1. 4), we increase k and repeat the process. After 
the timeout tz, we run belief exploration explore for tg seconds, where we use 
Fz as backup controllers (1. 9). After the timeout tg (exploration will continue 
from a stored configuration in the next belief phase), we use Fz to obtain cut-off 
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values at unexplored states, compute the optimal policy oM” (see Sect. 4) and 
extract the FSC Fg which incorporates Fz. Before we continue the search, we 
check whether the belief-based FSC is better and whether that FSC gives any 
reason to update the memory model (1. 10). If so, we update u and reset the F 
(I. 11-12). 


The Size of an FSC. We have considered several sub-classes of FSCs and 
wish to compare the sizes of these controllers. For FSC F = (N,no,7,6), we 
define its size size(F) = size(y) + size(d) as the memory required to encode 
functions y and 6. Encoding y: N x Z — Act of a general k-FSC requires 
size(y) = J nen 2 zez | = k-|Z| memory. Encoding 6: N x Zx Z — N requires 
k-|Z|? memory. However, it is uncommon that in each state-memory pair (5, n) all 
posterior observations can be observed. We therefore encode ô(n, z,-) as a sparse 
adjacency list, i.e., as a list of pairs (2’,4(n, z,z’)). To define the size of such a 
list properly, consider the induced MC ME = (S x N, (s0, no), {a}, PE). Let 
post(n,z) == {O(s’) | ds € Sz: (s’,-) € supp(P*((s,n),a))} denote the set of 
posterior observations reachable when taking a transition in a state (s,n) of MË 
with O(s) = z. Tablel summarises the resulting sizes of FSCs of various sub- 
classes. The derivation is included in Appendix B of [3]. Table 4 on p. 18 shows 
that we typically find much smaller u-FSCs (Fz) than belief-based FSCs (Fg). 


7 Experiments 


Our evaluation focuses on the following three questions: 


Q1: Do the FSCs from inductive synthesis raise the accuracy of the belief MDP? 
Q2: Does exploiting the belief MDP boost the inductive synthesis of FSCs? 
Q3: Is the symbiotic approach improving run time, controller’s value and size? 


Table 2. Information about the benchmark POMDPs. 


Model IS| |X Act | |Z| | Spec. | Over- Model |S| |X Act||Z| | Spec. | Over- 
approx approx. 
4 x 3-95 22 |82 9 | Rmax | < 2.24 Drone-4-2 1226 | 2954 |761 | Pmax | < 0.98 
4x5x2-95 79 |310 7 | Rmax |< 3.26 Drone-8-2 13k | 32k 3195 | Pmax | < 0.99 
Hallway 61 | 301 23 | Rmin | > 11.5 Lanes+ 2741 | 5285 |11 Rmin | > 4805 
Milos-97 165 | 980 11 | Rmax | < 80 Netw-3-8-20 | 17k | 30k 2205 | Rmin | > 4.31 
Network 19 | 70 5 | Rmax < 359 Refuel-06 208 | 565 50 | Pmax | < 0.78 
Query-s3 108 | 320 6 | Rmax | < 600 Refuel-20 6834 | 25k 174 | Pmax |< 0.99 
Tiger-95 14 50 7 | Rmax | < 159 Rocks-12 6553 | 32k 1645 | Rmin | > 17.8 


Selected Benchmarks and Setup. Our baseline are the recent belief explo- 
ration technique [8] implemented in STORM [13] and the inductive (policy) syn- 
thesis method [5] implemented in PAyNT [6]. PAYNT uses STORM for parsing 
and model checking of MDPs, but not for solving POMDPs. Our symbiotic 
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framework (Algorithm 1) has been implemented on top of PAYNT and STORM. 
In the following, we use STORM and PAYNT to refer to the implementation of 
belief exploration and inductive synthesis respectively, and SAYNT to refer to 
the symbiotic framework. The implementation of SAYNT and all benchmarks 
are publicly available’. Additionally, the implementation and the benchmarks 
in the form of an artifact are also available at https://doi.org/10.5281/zenodo. 
7874513. 


Setup. The experiments are run on a single core of a machine equipped with 
an Intel i5-12600KF @4.9GHz CPU and 64GB of RAM. PAYNT searches for 
posterior-unaware FSCs using abstraction-refinement, as suggested by [5]. By 
default, STORM applies the cut-offs as presented in Sect. 4.1. SAYNT uses the 
default settings for PAYNT and STORM while tz = 60s and tg = 10s were taken 
for Algorithm 1. Under Q3, we discuss the effect of changing these values. 


Benchmarks. We evaluate the methods on a selection of models from [5,7,8] 
supplemented by larger variants of these models (Drone-8-2 and Refuel-20), by 
one model from [16] (Milos-97) and by the synthetic model (Lanes+) described 
in Appendix C of [3]. We excluded benchmarks for which PAYNT or STORM 
finds the (expected) optimal solution in a matter of seconds. The benchmarks 
were selected to illustrate advantages as well as drawbacks of all three synthe- 
sis approaches: belief exploration, inductive (policy) search, and the symbiotic 
technique. Table 2 lists for each POMDP the number |S] of states, the total 
number $ Act := )>,|Act(s)| of actions, the number |Z| of observations, the 
specification (either maximising or minimising a reachability probability P or 
expected reward R), and a known over-approximation on the optimal value com- 
puted using the technique from [7]. These over-approximations are solely used 
as rough estimates of the optimal values. Table5 on p. 20 reports the quality 
of the resulting FSCs on a broader range of benchmarks and demonstrates the 
impact of the non-default settings. 


Q1: FSCs provide better approximations of the belief MDP 


In these experiments, PAYNT is used to obtain a sub-optimal Fz within 10s which 
is then used by STORM. Table 3 (left) lists the results. Our main finding is that 
belief exploration can yield better FSCs (and sometimes faster) using FSCs from 
PayNnT—even if the latter FSCs are far from optimal. For instance, STORM with 
provided Fz finds an FSC with value 0.97 for the Drone-4-2 benchmark within 
a total of 10s (1s+9s for obtaining Fz), compared to obtaining an FSC of value 
0.95 in 56s on its own. A value improvement is also obtained if STORM runs 
longer. For the Network model, the value improves with 37% (short-term) and 
47% (long-term) respectively, at the expense of investing 3s to find Fz. For 
the other models, the relative improvement ranges from 3% to 25%. A further 
value improvement can be achieved when using better FSCs Fz from PAYNT; 


5 https://github.com/randriu/synthesis. 
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see Q3. Sometimes, belief exploration does not profit from Fz. For Hallway, the 
unexplored part of the belief MDP becomes insignificant rather quickly, and so 
does the impact of Fz. Clipping [8], a computationally expensive extension of 
cut-offs, is beneficial only for Rocks-12, rendering Fz useless. Though even in 
this case, using Fz significantly improves Short STORM that did not have enough 
time to apply clipping. 


Q2: Belief-based FSCs improve inductive synthesis 


In this experiment, we run STORM for at most 1s, and use the result in PAYNT. 
Table 3 (right) lists the results. Our main finding is that inductive synthesis can 
find much better FSCs—and sometimes much faster—when using FSCs from 
belief exploration. For instance, for the 4 x 5 x 2 benchmark, an FSC is obtained 
about six times faster while improving the value by 116%. On some larger models, 
PAYNT alone struggles to find any good Fz and using Fg boosts this; e.g., the 
value for the Refuel-20 model is raised by a factor 20 at almost no run time 
penalty. For the Tiger benchmark, a value improvement of 860% is achieved 
(albeit not as good as Fg itself) at the expense of doubling the run time. Thus: 
even a Shallow exploration of the belief MDP pays off in the inductive synthesis. 
The inductive search typically profits even more when exploring the belief MDP 
further. This is demonstrated, e.g., in the Rocks-12 model: using the FSC Fg 
computed using clipping (see Table 3 (left)) enables PAYNT to find FSC Fz 
with the same (optimal) value 20 as Fg within 1s. Similarly, for the Milos-97 
model, running STORM for 45s (producing a more precise Fg) enables PAYNT 
to find an FSC Fz achieving a better value than controllers found by STORM or 
PAYNT alone within the timeout. (These results are not reported in the tables.) 
However, as opposed to Q1, where a better FSC Fy naturally improves the 
belief MDP, longer exploring the belief MDP does not always yield a better Fz: 
a larger MË with a better Fg may yield a larger memory model pu, thus inducing 
a significantly larger family where PAYNT struggles to identify good FSCs. 


Q3: The practical benefits of the symbiotic approach 


The goals of these experiments are to investigate whether the symbiotic app- 
roach improves the run time (can FSCs of a certain value be obtained faster’), 
the memory footprint (how is the total memory consumption affected?), the 
controller’s value (can better FSCs be obtained with the same computational 
resources?) and the controller’s size (are more compact FSCs obtained’). 


Value of the Synthesised FS'Cs. Figure4 plots the value of the FSCs produced 
by STORM, PAYNT, and SAYNT versus the computation time. Note that for 
maximal objectives, the aim is to obtain a high value (the first 4 plots) whereas 
for minimal objectives a lower value prevails. From the plots, it follows that the 
FSCs from the symbiotic approach are superior in value to the ones obtained by 
the standalone approaches. The relative improvement of the value of the resulting 
FSCs differs across individual models, similar to the trends in Q1 and Q2. When 
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Table 3. Left (Q1): Experimental results on how a (quite sub-optimal) FSC Fz 
computed by PAYNT within 10s impacts STORM. (For Drone-8-2, the largest model 
in our benchmark, we use 30s). The “PAYNT” column indicates the value of Fz and 
its run time. The “Short STORM” column runs storm for 1s and compares the value 
of FSC Fg found by STORM alone to STORM using Fz. The “Long STORM” column 
is analogous, but with a 300s timeout for STORM. In the last row, * indicates that 
clipping was used. Right (Q2): Experimental results on how an FSC Fg obtained by 
a Shallow exploration of the belief MDP impacts the inductive synthesis by PAYNT. The 
“STORM” column reports the value of Fg computed within 1s. The “PAYNT” column 
compares the values of the FSCs Fz obtained by PAYNT itself to PAYNT using the FSCs 
Fg within a 300s timeout. 


PAYNT||Short STORM||Long STORM STORM| PAYNT 
Model Fr + Fr +Frz Model Fg + Fg 
Drone-4-2 0.94| 0.92 0.97|| 0.95 0.97 4x5x2-95 2.08 | 0.94) 2.03 
Puas 9s 1s 1s|| 56s 57s Rmax <1s| 258s) 38s 
Network | 266.1||186.7\ 274.5||202.1| 277.1 Refuel-20 0.09 |<0.01, 0.19 
Rmax 3s|| <1s <1s|| 26s 33s Prax 1s 10s) 1is 
Drone-8-2 0.9) 0.6 0.96|| 0.68} 0.97 Tiger-95 50.38]| 2.99 28.73 
Tias 28s 3s 3s|| 101s| 103s Riia <ls 14s) 23s 
4x3-95 1.66) 1.62 1.82|| 1.84 1.88 4x3-95 1.62) 1.75 1.84 
Rmax Ts| <1s <1s|| 60s 72s Rmax <l1s 14s] 238s 
Query-s3 | 425.2)/417.4| 430.0}|419.6| 432.0 Refuel-06 0.67)| 0.35 0.67 
Rmax Ts 2s 2s|| 91s 94s Pas <lsi| <ls) 42s 
Milos-97 || 31.56 /37.15) 39.15]/38.35| 40.64 Milos-97 37.15 | 31.56 39.29 
Rmax 3s| <1s <l1s|} 42s 42s Rina <l1s 3s) 215s 
Hallway 16.05)|13.07| 12.63)/12.55) 12.55 Netw-3-8-20|| 11.93)| 11.07 10.95 
Rmin 9s ls 1s|| 160s| 167s Rmin 1s| 185s) 271s 
Rocks-12 42 38} 31.89|| 20* 20* Rocks-12 38 42 38 
Rmin <ls| <ls <1s|| 10s 10s Rmin <ls| <ls <ls 


comparing the best FSC found by STORM or PAYNT alone with the best FSC 
found by SAYNT, the improvement ranges from negligible (4 x 3-95) to around 
3%-7% (Netw-3-8-20, Milos-97, Query-s3) and sometimes goes over 40% (Refuel- 
20, Lines+). We note that the distance to the (unknown) optimal values remains 
unclear. The FSC value never decreases but sometimes does also not increase, as 
indicated by Hallway and Rocks-12 (see also Q2). Our experiments (see Table 5) 
also indicate that the improvement over the baseline algorithms is typically more 
significant in the larger variants of the models. Furthermore, the plots in Fig. 4 
also include the FSC value by the one-shot combination of STORM and PAYNT. 
We see that SAYNT can improve the FSC value over the one-shot combination. 
This is illustrated in, e.g., the 4 x 3-95 and Lanes+ benchmarks, see the 1st and 
3rd plots in Fig. 4 (left). 
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Fig. 4. Value of the generated FSCs over time. The last graph shows the average 
memory usage of STORM and SAyYNT. The lines ending before the timeout indicate 
that the 64GB memory limit was hit. e indicates that PAYNT and SAYNT synthesised 
posterior-aware FSCs. © indicates that SAYNT ran with tz =90s. (Color figure online) 


Total Synthesis Time. SAYNT initially needs some time for the first iteration 
(one inductive and one belief phase) in Algorithm 1 and thus during the begin- 
ning of the synthesis process, the standalone tools may provide FSCs of a certain 
value faster. After the first iteration, however, SAYNT typically provides better 
FSCs in a shorter time. For instance, for the Refuel-20 benchmark SAYNT swiftly 
overtakes STORM after the first iteration. The only exception is Rocks-12 (dis- 
cussed before), where SAYNT with the default settings needs significantly more 
time than STORM to obtain an FSC of the same value. 
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Table 4. Trade-offs between the value and size in the resulting FSCs Fz and Fg 
found by SAYNT. Each cell reports value/size. The first three models have a minimising 
objective. © indicates that SAYNT ran with tr =90s. 


Models: | Lanes+ Hallway | Netw-3-8-20 | Query-s30 | Refuel-06 | Drone-8-2 | Refuel-20 
Fg 4805/8.1k | 12.55/2k | 10/40k 511.32/7.7k 0.67/84 | 0.96/237k | 0.24/1.5k 
Fz 6591/34 | 15.46/86 | 11.04/4.8k |509.49/26 | 0.67/156 |0.90/6.4k | 0.2/362 


Memory Footprint. Belief exploration typically has a large memory footprint: 
STORM quickly hits the 64GB memory limit on exploring the belief MDP. SAYNT 
reduces the memory footprint of STORM alone by a factor 3 to 4, see the bottom 
right plot of Fig. 4. The average memory footprint of running PAYNT standalone 
quickly stabilises around 700MB. The memory footprint of SAYNT is thus dom- 
inated by the restricted exploration of the belief MDP. 


The Size of the Synthesised FSC's. For selected models, Table 4 shows the trade- 
offs between the value and size of the resulting FSCs Fz and Fg found by SAYNT. 
The experiments show that the FSCs Fr provided by inductive synthesis are 
typically about one to two orders of magnitude smaller than the belief-based FSCs 
Fg with only a small penalty in their values. There are models (e.g. Refuel-06) 
where a very small Fg, having even slightly smaller size than Fz, does exist. 
The integration mostly reduces the size of Fg due to the better approximation 
of the belief MDP by up to a factor of two. This reduction has a negligible effect 
on the size of Fz. This observation further strengthens the usefulness of SAYNT 
that jointly improves the value of Fz and Fg. Hence, SAYNT gives users a unique 
opportunity to run a single, time-efficient synthesis and select the FSC according 
to the trade-off between its value and size. 


Customising the SAYNT Setup. In contrast to the standalone approaches as well 
as to the one-way integrations presented in Q1 and Q2, SAYNT provides a single 
synthesis method that is efficient for a general class of models without tuning its 
parameters. Naturally, adjusting the parameters to individual benchmarks can 
further improve the quality of the computed controllers: captions of Fig. 4 and 
Table 4 describe which non-default settings were used for selected models. 


Additional Results 


In Table5, we compare values and sizes of FSCs synthesised by the particular 
methods on a broader range of benchmarks. We can see that FSCs Fz obtained 
by SAYNT achieve better values than the controllers computed by PAYNT; size- 
wise, these better FSCs of SAYNT are similar or only slightly bigger. Meanwhile, 
for FSCs Fg obtained by SAYNT, we sometimes observe a significant size reduc- 
tion while still improving the value compared to the FSCs produced by STORM. 
Two models are notable: On Drone-8-2, SAYNT obtains 50% smaller Fg while 
having a 41% better value. On Network-3-8-20, the size of Fg is reduced by 40% 
while again providing better value. 
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Table 5. The quality and size of resulting FSCs provided by PAYNT, STORM, and 
SAYNT within the 15-min timeout. The run times indicate the time needed to find the 
best FSC. Non-default settings: x marks experiments where clipping was enabled, e 
marks experiments where PAYNT synthesised posterior-aware FSCs, © marks experi- 


ments where integration parameter tz was set to 90s. 


Benchmark Model Size PAYNT STORM SAYNT 
Model Spec. || S/X Act Z Fr | Size Fg Size Fg Size Fr Size 
1.89¢ 968| 1.87e 126 
4x3 22 1.81 1.87 2835 120s 
95 max s2 || 764s) 35!) 414s| 999/189] 869 1.79 36 
303s 678s 
BED p 77 || 0.94) 26], 2.08] 102 2.08 102 2.03 38 
95 max 310 305s 38 71s 378s 
0.890 169k! 0.876 25k 
Drone 1226 0.87 0.84 390s 4538 
41 Pmax |! 3996 384| 665s 78) 1105| Y| 0.89 176k 0.79 922 
180s 45s 
Drone p 1226 „a | 0.95 15k) 0.95] 135k|| 0.97 140k 0.94 1.5K 
42 Pmax | 3026 900s 110s 194s Is 
Drone p I3k 395 | O9 6.4k| 0.68] 280k|| 0.96 140k 0.9 6.4K 
8-2 pmax 32k 260s 98s 247s 30s 
61 15.54| 66| 12.55] 1.9k|| 12.55| 1.8k| 15.46) 86 
Hallway Rmin 301 | 26s 916s 263s 293s 
ue 2741 | | 8223 42| 18870| 81k] 4805 81k 6591/34 
min || 5289 118s 376s 173s 114s 
41.990 692 35.820 40 
165 31.56 39.03 370s 185s 
Milos-97 Rmax gs0 H 4s| | sss) 83-2155 2900 3541 10 
270s 114s 
289.18e 2k 287.23e 54 
19 280.33 209.71 395s 106s 
Network Rmax 7 5 38s. 2| 1108| 745| 28451 18k) 280.33 22 
85s Als 
Netw p 4580 5473) 424/23k|/ 3.21) 34k 3.2 23k 4.19 2.5k 
2-8-20 Pmin | 6973 914s 11s 71s 211s 
Netw p I7k p29; | 11-04 44k| 10.27] 64k 10) 38k 11.04 4.8k 
3-8-20 Pin 30k 638s 238s 742s 379s 
511.320 7.7k 509.490 26 
Query 108 502.3 420.11 5665 362s 
53 max 320 | g3is| | 1g4s|12-9*||—agaa1 7.7k 478.59 28 
700s 610s 
Refuel p 208, | 0.35 100] 0.67) 343 0.67 84 0.67| 156 
06 max 565 <1s 182s 178s 845 
Refuel p 470g | 032 2| 0.44) 5a4|| 0.45 140 0.3) 142 
os Fmax | 1431 253s 96s 186s 84s 
Refuel p 6834 57, | 0.02/ 348) 0.15) 12k|| 024 15k 0.2) 360 
20 max 24k 9225 4685 386s 1738 
Rocks p BOS ani 42 3.3k| 20*| 115 20* 115| 20+ 3.3k 
12 min 32k <is 15s 2355 236s 
Tiger p 14 „|| 793| 34) 50.38) 58|] 50.38) 58| 3161| 48 
95 max 50 547s <is 71s 513s 


In the following, we further discuss the impact of non-default settings for 
selected benchmarks, as presented in Table5. For instance, using posterior- 


aware FSCs generally significantly slows down the synthesis process, however, 
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for Network and 4 x 3-95, it helps improve the value of the default posterior- 
unaware FSCs by 2% and 4%, respectively. For the former model, a better Fz 
also improves Fg by about a similar value. In some cases, e.g. for Query-s3, it 
is beneficial to increase the parameter tz, giving PAYNT enough time to search 
for a good FSC Fz (the relative improvement is 6%), which also improves the 
value of the resulting FSC Fg by about a similar value. Tuning tz and tg can 
also have an impact on the value-size trade-off, as seen in the Milos-97 model, 
where setting longer timeout tz results in finding a 2% better Fg with 130% size 
increase. A detailed analysis of the experimental results suggests that usually, 
it is more beneficial to invest time into searching for good Fz that is used to 
compute better cut-off values, rather than into deeper exploration of belief MDP. 
However, the timeouts still need to allow for multiple subsequent iterations of 
the algorithm in order to utilise the full potential of the symbiosis. 


8 Conclusion and Future Work 


We proposed SAYNT, a symbiotic integration of the two main approaches for 
controller synthesis in POMDPs. Using a wide class of models, we demonstrated 
that SAYNT substantially improves the value of the resulting controllers and 
provides an any-time, push-button synthesis algorithm allowing users to select 
the controller based on the trade-off between its value and size, and the synthe- 
sis time. 

In future work, we plan to explore if the inductive policy synthesis can also be 
successfully combined with point-based approximation methods, such as SAR- 
SOP, and on discounted reward properties. A preliminary comparison on dis- 
counting properties provides two interesting observations: 1) For models with 
large reachable belief space and discount factors (very) close to one, SARSOP 
typically fails to update its initial alpha-vectors and thus produces low-quality 
controllers. In these cases, SAYNT outperforms SARSOP. 2) For common dis- 
count factors, SARSOP beats SAYNT on the majority of benchmarks. This is 
not surprising, as the MDP engine underlying SAYNT does not natively sup- 
port discounting and instead computes a much harder fixed point. See [15], for 
a recent discussion on the differences between discounting and not discounting. 
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Abstract. We present a specification language and a fully automated 
tool named AuToQ for verifying quantum circuits symbolically. The 
tool implements the automata-based algorithm from [14] and extends 
it with the capabilities for symbolic reasoning. The extension allows to 
specify relational properties, i.e., relationships between states before and 
after executing a circuit. We present a number of use cases where we 
used AuTOQ to fully automatically verify crucial properties of several 
quantum circuits, which have, to the best of our knowledge, so far been 
proved only with human help. 


1 Introduction 


Recently, quantum computing has received much attention, driven by several 
technological breakthroughs [7] and increasing investments. Prototype quan- 
tum computers are already available. The opportunities for the general public— 
particularly students, researchers, and technology enthusiasts—to access quan- 
tum computing devices are rapidly increasing, e.g., through cloud services such as 
Amazon Braket [1] or IBM Quantum [2]. Due to the complexity and probabilistic 
nature of quantum computing, the chance of errors in quantum programs is much 
higher than that of traditional programs, and conventional means for correct- 
ness assurance, such as testing, are much less applicable in the quantum world. 
Quantum programmers need better tools to help them write correct programs. 
Therefore, researchers anticipate that formal verification will play a crucial role 
in quantum software quality assurance and have, in recent years, invested signif- 
icant effort in this direction [5,11,21,41-43, 45,46]. Nevertheless, practical tools 
for automated quantum program/circuit verification are still missing. 

This paper introduces AUTOQ!, a fully automated tool for quantum circuit 
verification based on the approach proposed in [14]. In particular, AUTOQ checks 
the validity of a Hoare-style specification {Pre} C {Post}, where C is a quantum 
circuit (a sequence of quantum gates) in the OPENQASM format [17] and the 


1 Available at https: //github.com/alan23273850/AutoQ. 
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precondition Pre and postcondition Post represent sets of (pure) quantum states. 
The check is done by executing the circuit with all quantum states satisfying Pre 
(using a symbolic representation) and testing that all resulting quantum states 
are in the set denoted by Post. 

AUTOQ combines two main techniques to efficiently and effectively represent 
and reason about (potentially infinite) sets of quantum states: 


1. As in [14], we use tree automata (TAs), finite-state automata accepting lan- 
guages of trees, to efficiently represent sets of quantum states: Each quantum 
state over n qubits can be seen as a binary decision tree over n variables such 
that, e.g., in a 3-qubit circuit with qubits |a12273), if the computational basis 
state |010) in a quantum state has the probability amplitude i, then there will 


be a branch zı 2 T2 i £3 2 ł in the corresponding tree. The use of TA- 


based representation of a set of quantum states has several advantages: (a) It 
is concise: e.g., in order to represent the set of all 2” basis states of an n-qubit 
quantum circuit, we suffice with a TA with O(n) states and transitions. (b) It 
allows to efficiently perform quantum gate operations on the whole set of quan- 
tum states represented by a TA at once [14]. 

2. In this work, we further consider symbolic quantum states, represented by 
assigning symbolic values to computational basis states (and having an addi- 
tional formula to relate these symbolic values). For instance, we can represent 
the set of all n-qubit quantum states where the computational basis |0... 0) 
has a strictly larger probability of measurement than all other basis states by 
a symbolic quantum state assigning |0...0) — va and |y1...Yn) > ve for all 
Y1 -- -Yn Æ 0...0, together with the formula |v;,|? > [ve]? Alun |2+(2"—-1)|ve|? = 
1, where v, and vg are symbolic variables ranging over complex numbers 


By combining these two techniques, i.e., using TAs with symbolic variables 
in leaves, we can have a representation of all n-qubit quantum states where an 
arbitrary basis has a strictly larger amplitude than other basis states using O(n) 
states and transitions. 

Using such a symbolic encoding is essential to allow us to describe relational 
specifications, e.g., it allows us to express properties like “the probability ampli- 
tude of the basis state |000} is increased after executing the circuit C” (for this, 
in the postcondition, we use TAs accepting trees with predicates in leaves, a sub- 
class of symbolic tree automata of [36]). Such a property can then be verified 
by executing the quantum circuit symbolically in the spirit of symbolic execu- 
tion [27] (i.e., such that the values of amplitudes are not complex numbers but, 
instead, symbolic terms) and checking whether all trees in the language of the 
resulting TA satisfy the desired property (using a modified antichain-based algo- 
rithm for testing TA language inclusion [4,10]). Combining TAs and symbolic 
variables as the language for quantum predicates allows full automation and 
can be used to express many crucial properties of quantum circuits, as we will 
demonstrate later. AUTOQ is the first tool implementing this approach. 


Related Work. Our work belongs to the line of Hoare-style verification of quan- 
tum programs, which has been widely discussed in the past [22,29,35, 40,44]. This 
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family of approaches follows D’Hondt and Panangaden’s suggestion of using var- 
ious Hermitian operators as quantum predicates, resulting in a very powerful yet 
complete proof system [20]. However, specifying properties using Hermitian oper- 
ators is often not intuitive and is inconvenient for automation due to their enor- 
mous matrix sizes. Therefore, often these approaches are implemented on top of 
proof assistants such as Coq [9] and ISABELLE [37] and require significant man- 
ual work in proof search. The QBRICKS [12] approach alleviates the difficulty of 
the proof search by combining state-of-the-art theorem provers with decision pro- 
cedures building on top of the WHY3 platform [24]. The approach, however, still 
requires a significant amount of human intervention. 

Regarding other quantum program /circuit /protocol verification tools, circuit 
equivalence checkers [5,11,15,26,39] are often quite efficient but less flexible in 
specifying the desired property (only equivalence). They are particularly useful 
in compiler validation; notable tools include QCEC [11], and FEYNMAN [5]. Quan- 
tum model checking supports a rich specification language (flavors of temporal 
logic [23,30,38]) and is more suitable for verifying high-level protocols due to the 
quite limited scalability [6]. One notable tool in this category is QPMC [23]. 
Quantum abstract interpretation [32,43] is particularly efficient in processing 
large-scale circuits, but it grossly over-approximates the state space (it cannot 
verify basic properties of, e.g., Grover’s algorithm) and cannot conclude any- 
thing when verification fails. In contrast, AUTOQ can be conveniently used for 
quantum program development and debugging since it automatically computes 
the exact set of reachable states?. The mentioned tools are fully automated but 
have different goals or address different parts of the software development cycle 
than AUTOQ. 


Contributions. AUTOQ evolved from a simple prototype used for performance 
evaluation in [14] into a robust tool. In addition, we added the following major 
extensions: 


1. We combined the TA specification with symbolic variables, allowing users to 
specify advanced relational properties of quantum circuits. 

2. We developed a new entailment-checking algorithm for the symbolic TA spec- 
ification based on the antichain algorithm for automata language inclusion 
testing. 

3. We introduced a high-level language to simplify writing TA specifications. 


These improvements are pushing the capabilities of AUTOQ, and also of practical 
quantum circuit verification itself, much further. 


Outline. In Sect.2, we describe our approach to TA-based specification and ver- 
ification of quantum circuits. In Sect. 3, we discuss the new entailment-checking 
algorithm for the symbolic TA representation. We discuss the architecture of 
AUTOQ in Sect.4 and demonstrate the use of the specification language and 
AUTOQ for automated verification of several case studies in Sect. 5. 


2 A predecessor of the presented version of AUTOQ has already caught a bug in QCEC, 
cf. [3]. 


142 Y.-F. Chen et al. 


s —» (s1, 80) Pas a s 


x v, as Ts T2 T2 
so —> (s2,52) s2 2 (0) y K y K y% N y y 
3 25 a 2h, 0 Or op tie v vatsue enze env nave 
(a) The precondition TA P (b) The tree accepted by P (c) The tree accepted by R 
ry 
es a 
z Ol<|o 
so B (52,82) sa ES, () za z 
0 1 0 1 
za, ( [O|>|en! ZOOS aN 
a i mT D> lent  Ol< lee] IO < eel < fee 
(d) The postcondition TA Q (e) The tree accepted by Q 


Fig. 1. Verification of a circuit C amplifying the amplitude of |00) w.r.t. the specifica- 
tion {P, y} C {Q} with y: |un + 3ve| > |2vn|. R is the TA obtained by executing P 
on C. 


2 Tree Automata-Based Verification of Quantum Circuits 


We will begin with minimal formal definitions of the TA-based specification 
and demonstrate how to use them to verify quantum circuits in AUTOQ with 
examples. We assume a basic knowledge of quantum computation (see, e.g., the 
classical textbook [31]). 

Let us fix a finite set of quantum variables X = {z1,... , £n} witha linear order- 
ing (we assume z1 < ... < £n) and a disjoint non-empty leaf alphabet X. We will, 
in particular, work with X = X; W Xp where X}; is the alphabet of terms and Xp is 
the alphabet of predicates in a suitable first-order theory (discussed later). 

We use {0,1}S” to denote Uo<;<n {10,1}. A (symbolic binary decision) tree 
over X and X is a function 7: {0,1}S" — (XU X) such that for all positions p € 
{0,1} with i < n, we have r(p) = x;4, and for all positions p € {0,1}", we have 
T(p) € X. An example of a tree 7 can be found in Fig. 1b, where X = {vp, ve}, 
T(e) = £1, T(0) = T(1) = z2, T(00) = vp, and T(p) = ve for p € {0,1}? \ {00}. 

A (symbolic) tree automaton (TA) is a tuple A = (S, A, F) where S is a finite 
set of states, AC (Sx Xx Sx S)U(S x X) is a transition relation, and F CS 
is the set of root (final) states. We denote transitions from A as s > (so, 51) 
and s “ () respectively. An example of a TA with the set of root states {s} can 
be found in Fig. la. 

A run of A on 7 is a function p: {0,1}S$" — S s.t. for all positions p € {0,1}? 
with 7 < n, it holds that p(p) = (p(p.0), p(p.1)) € A and for all positions 
p € {0,1}”, it holds that p(p) z0, () € A. The run p is accepting iff p(e) € F 
and the language of A is L(A) = {r | A has an accepting run on 7}. Observe 
that the tree in Fig. 1b is in the language of the TA P in Fig. la with the run 
p such that p(e) = s, p(0) = sı, p(1) = so, p(00) = s3, and p(p) = s2 for 
p € {0,1}? \ {00}. 

Now we are ready to demonstrate how to write specifications of quantum cir- 
cuits with TAs using a running example. We assume that C is a 2-qubit circuit that 
amplifies the amplitude of the basis state |00) (under some constraint y over input 
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states) and reduces the amplitudes of other basis states. We first prepare the pre- 
condition of C, which consists of a pair (P, y), where P is a TA with the root state 
s, a set of terms % as the leaf alphabet, and the set of transitions from Fig. la, 
and ọ is a first-order constraint over the variables used in X. In X4, we use two 
variables over complex numbers, ve and vp, to denote the corresponding amplitude 
(low and high). The constraint y states that |v + 3ve| > |2va| (required by this 
circuit C, cf. Sect. 5.4). Recall that the TA P from Fig. la accepts the tree from 
Fig. 1b, which in turn represents the quantum state 


S=Up |00) F Ug |01) t ve |10) t ve |11). (1) 


AUTOQ will execute the gates in C to transform the TA P to another TA 
R capturing the effect of executing C over all quantum states encoded in P. 
The algorithm for gate operations is almost the same as the one in [14], except 
that now the update of leaf symbols works symbolically (similarly to symbolic 
execution [27]: each leaf symbol is a term over vp and vg and quantum gates 
change the terms by accumulating the operations that would be performed on 
them, potentially simplifying them). In this example, the TA R will accept only 
one tree representing the quantum state 


s! = (%42) 00) + (PAF) 101) + (23%) |10) + (AZ) [11), (2) 


Observe that under the precondition y = |vp + 3ve| > |2v,|, the probability of 
|00) is indeed increased (|%+2"4|? > |un|?). The tree representation of s’ can be 
found in Fig. 1c. The TA Q of the postcondition can be found in Fig. la. The 
leaf alphabet of Q is the set of predicates X, = {|O| > |val, |G] < |ve|} where 
denotes a free variable. Observe that Q accepts the tree from Fig. le. 


2.1 High-Level Specification Language 


In AUTOQ, we provide a simple specification language that can be automati- 
cally translated to TAs. The language allows users to focus on the properties 
they want to express without the need to specify details of the TA structure. 
Our language is particularly suitable for describing sets of states with one high 
probability branch and other branches with uniformly low or zero probability, 
a very common pattern of quantum circuit’s correctness properties. For example, 
in the language, we can use (|00): vp, |x}: ve), where “|*)” denotes “other basis 
states,” to define the tree language of the TA in Fig. la, which accepts a sin- 
gle tree representing the quantum state vp |00) + ve |01) + ve |10} + ve |11) from 
Fig. 1b. Similarly, we can use (|00): |O| > |val, |*): |O| < |ve|) to represent the 
language of the TA in Fig. 1d. The set of all 2-qubit basis states {|i) | i € {0,1}?} 
is expressed as Ji € {0,1}?: (Ji): 1,|*): 0) (we can see it as a predicate that is 
satisfied by the described quantum states). We also allow the tensor product @ 
operator, which multiplies the amplitude of the product basis states. For exam- 
ple, (|00): 1, |*): 0) @ (100): va, |*): ve) ® (100): 1, |*): 0) represents the (singleton) 
set of states compactly {vj, |000000) + >) ;e{01,11,10} Ve [00700) }. 
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A more challenging example is to represent the set of states 


fon lii000) + $` ve |ij000) 


JE{O1}SAj Ai 


i € {0, 15}. (3) 


Such a set can be described with the help of the & and 3 operators as follows: 


i € {0,1}*: (Jù: 1, |*): 0) @ (li): vn, |*): ve) (1000): 1, |*): 0). (4) 


Below is the grammar of specification spec: 


spec ::= state | Ji € {0,1}”": state | spec, state 
state ::= (|c1): t,...,|ex): t, |x): t) | (li): t, |x): t) | state ® state 
te X, nE€N, and c1,...,Ck € {0,1}” 


A spec is ill-formed when a free variable i appears in state, if some basis is 
repeated in the rule (|c1): t,...,|Cp): t, |x): t), or if the previous rule contains 
two bases of different lengths. If all basis states of the given length are specified 
in (|c1): t,...,|cx): t, |x): t), the |): t part is not required any more. The spec- 
ification is then converted into a TA using a straightforward algorithm; in the 
following we often confuse a TA and its specification. 


2.2 Complex Number Representation 


In a (pure) quantum state, the amplitude of a basis computational state is a 
complex number, and the corresponding probability is the square of the abso- 
lute value of the amplitude. For verification, we need an exact representation of 
complex numbers that can be used in computers. In AUTOQ, we use a subset 
of complex numbers that can be expressed by the following algebraic encod- 
ing (cf. [14,34,46]): 


(z6 + bw + cw? + dw’), (5) 


where a,b,c,d € Z, k € N, and w =e? =cos45°+isin 45° = v2 4 {v2 the unit 
vector that makes an angle of 45° with the positive real axis in the complex plane. 
A complex number is then represented by a quadruple (a, b, c,d) of integers and 
a normalization factor k. Although the considered set of complex numbers is only 
a small subset of all complex numbers (it is countable, while the set of all complex 
numbers is uncountable), the subset is sufficient to describe various standard 
quantum gates. Currently, AUTOQ supports the set of quantum gates X, H, Y, 
Z, S, T, Rx(4), Ry($), CNOT, CZ, Toffoli (cf. the list in [14]), which already 
includes a set of universal quantum gates. From the Solovay-Kitaev theorem [18], 
gates performing rotations of 5, used, e.g., in Shor’s algorithm [33] and quantum 
Fourier transform (QFT) [16], can be approximated with an error rate € by 
O(log?” (1))-many H, CNOT, and T gates. The algebraic representation is also 
sufficient to represent all reachable states in OPENQASM circuits with the set 
of supported gates, where the initial basis state is |0... 0). 
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AUTOQ operates on the introduced representation of complex numbers. More 
precisely, for a specification {P, yp} C {Q}, the leaf symbols of P are quadruples 
of integer terms (a, b, c, d). We assume that all leaf symbols of P share a common 
normalization factor k, so we do not store the value of k explicitly since it can 
be inferred from the fact that the probability sum over all basis states is one. 
Instead, we remember a constant natural number value ke, the difference of the 
k value between P and R, and use it to normalize the amplitudes. Recall that 
R is the TA accepting all states after executing C from some states accepted by 
P. The initial value of k is zero, and each application of H, Rx(4), or Ry(4) 
gates will increase it by one (cf. [14]). We normalize all quadruple leaf symbols 
(a,b, c,d) of R by multiplying them with (a) once R is computed. 

Next, we show how to compose a specification of our running example from 
Fig. 1 using the algebraic representation. The specification can now be written as 


P: (100): (Uf, vf, vf, VR) I): (UF, Ue, UF, UZ) 
Q: (|00): |(21, 02, aP > (VR oR VR VRP, | gs \(G, O2, 03, 04)|? < (uf, ve, vg, vp), 
where |(a, b, C, dP = la + bw + cw + dw? |? 

= Ja + (22 + vei i) + ci + d(— V2 + Anj? 


= (a+b? 02) + (OF e+ ae? 


2.3 Precise Semantics of the Specification 


As mentioned above, for verifying {P,y} C {Q}, we start with a TA P repre- 
senting the set of all quantum states satisfying the precondition and compute 
a TA R representing the set of states reachable after executing the circuit C. 
Then, we test whether R entails Q (w.r.t. p), i.e., whether all reachable states 
satisfy the postcondition. 

Formally, we say that a tree 7, is entailed by a tree T2 w.r.t. a first-order 
formula vy, denoted as T) Fy T2, if for all positions p € {0,1}” it holds that 
either (i) 71(p) = T2(p) or (ii) T1(p) = (t,..-, te) € Xt, Te(p) = Y € Xp, and 
p => y[ti/Oi])...[t./Ox]. We lift the entailment to TAs: A; Fy A2 iff for all 
trees T4 € £(A;) there exists a tree 72 E L(A2) s.t. Ti Ey T2- 


3 Entailment Checking 


We will now describe how we perform the entailment check R -, Q. Since we 
operate with trees and tree automata over symbolic values, we cannot establish 
entailment by running a classical TA language inclusion test based on comple- 
menting the automaton Q first. Instead, our algorithm for testing the entail- 
ment R y Q is based on an on-the-fly TA inclusion checking algorithm [4, 10], 


3 We never have a predicate from Xp on the left-hand side of the entailment test, so 
we do not need to test implication between predicates, which would be needed for 
a complete procedure. 
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Algorithm 1: Checking R -, Q 

Input: A TA R = (Sr, Ar, Fy), a TA Q = (Sq, Ag, Fq), a formula y 
Output: true if R = Q, false otherwise 

Processed — 0; 


Worklist — Min{(sr,Uq) | sr > () € Ar, 


m 


N 


3 Ug = {tg € Qa | ua  () V Bug => () € Ag: Y = paltr/O]}}; 
4 while Worklist 4 0 do 

5 (sr, Ug) — Worklist.pop(); 

6 if s, € F, A Ua N Fy =9 then return false ; 
7 

8 

9 


Processed — Min( Processed U {(sr,Uq)}); 
tmp — ({(s,,Uq)} x Processed) U (Processed x {(sr,Uq)}); 
foreach ((s}, U}), (s2, U2)) € tmp,a € X do 


10 H, — {s} € Qr | s S (s1, 32) € 4r}; 
11 U, — {s4 € Qg | Is} € Uy, 382 € U2: 8q = (s4, s2) € Ag}; 
12 foreach si. € H, s.t. (s1, U1) ¢ [Processed U Worklist| do 


13 | Worklist — Min( Worklist U {(s1., U{)}); 
14 return true; 


which avoids complementation. The on-the-fly inclusion-checking algorithm can 
be seen as an optimization of the classical construction, which would establish 
L(R)NL(Q) = Ø by first computing the complement Q° of Q (using a bottom-up 
TA determinization), followed by computing the intersection Aj, of Of and R, 
and, finally, checking language emptiness of An. In particular, the on-the-fly 
inclusion checking algorithm can be seen as doing all the operations at once. 
Furthermore, the algorithms in [4,10] also make use of the so-called antichains 
and TA simulation to prune the explored state space. 

Our modification of the inclusion algorithm to test TA entailment, given in 
Algorithm 1, mainly differs from [4,10] in the way initial sets of state pairs are 
computed on Line 3. In particular, we match a state s, that can perform a leaf 
transition over t, in R with the set U; of all states in Q that can perform a leaf 
transition either over t, or over a predicate pg such that y > p,|[t-/L]] (we use 
Pqltr/O] for a tuple t, to denote the substitution of the tuple’s components into 
the corresponding free variables of pq). 

After that, the algorithms perform a simultaneous bottom-up traversal 
through R (represented by states sp) and the determinized version of Q (rep- 
resented by sets of states U,). For each such pair (sr, Ug), the algorithm first 
checks whether s, is a root state and U, does not contain any root state (cf. 
Line 6; this would mean that R accepts some tree that is not accepted by Q). 
If this does not hold, then the algorithm tries to find all already processed pairs 
that can make a transition with (s,,U,) (cf. Line 8) and continue from all such 
pairs. Each bottom-up successor (s;.,Uj) is then added to Worklist in the case 
it has not been seen previously (cf. Line 13). 

The algorithm uses the function Min (cf. Lines 3, 7, and 13) to minimize the 
sets Worklist and Processed w.r.t. a subsumption relation, and the downward 


AUTOQ: An Automata-Based Quantum Circuit Verifier 147 


closure for | Processed U Worklist] on Line 12 to prune the explored state space. 
Due to lack of space, we refer to the works [4,10] for more details about these 
optimizations. 


4 Architecture 


We illustrate the architec- Precondition: 


i j P.aut or P.hsl ' AUTOQ 4 

ture of AUTOQ in Fig.2.  y.smt 

The tool is written in Grai — Preprocessor | j 

ircult: 1 q 

C++ and uses the follow-  C.gasm PC 
ing external tools: the TA : ' Circuit Executor [?] E. > out 

3 Postcondition: f 1 

library VATA [28] for effi- 09 aut or Q.bs1 | F ae 


cient testing of TA inclu- 5 + yo 

[19] a ' Verified/ 
sion (when the postcondi- Entailment Checker (Algorithm 1) ” Bug found 
tion uses only the term at ee ee eee ‘ 
alphabet ¥;) and the SMT Fig. 2. The architecture of AUTOQ. The input verifi- 


solver Z3 for entailment cation problem is {P,y} C {Q}. 
checking of leaf symbols in 


Algorithm 1. We allow any theory solver supported by Z3. In our experiment, we 
use QF_NIRA. AUTOQ takes as an input a quantum circuit in the OPENQASM 
format accompanied with the specification written as tree automata (.aut files) 
or the high-level specification language (.hs1 files) introduced in Sect. 2.1. 

Preprocessor reads the input files (.aut, .smt, .qasm, and .hs1 files), trans- 
lates specifications in the .hs1 files into tree automata, and stores them using 
AUTOQ’s internal data structures. Circuit Executor then reads the circuit C and 
the TA P and generates another TA R obtained as the result after executing 
C from states in P, using the approach of [14] with the symbolic extension dis- 
cussed in Sect. 2. AUTOQ can also output the TA R for further analysis. Finally, 
Entailment Checker checks whether R Hp Q and reports “verified” when the 
entailment holds and “bug found” otherwise. 


5 Use Cases 


In this section, we describe several use cases of quantum algorithms and their 
important properties that we were able to verify using AUTOQ fully automati- 
cally. We focus on the use of symbolic TA in this set of experiments and refer 
the readers to [14] for other experimental results. A selection of the obtained 
results is given in Table 1. An artifact that allows reproduction of the results is 
available as [13]. 


5.1 Hadamard Square is Identity 


Our first use case shows that the single qubit gate C that runs two consecutive 
H gates has the same effect as an identity matrix. We use the specification 
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{P,y} C {Q} with 


P: (10): (Va, Vb, Ve, Va), |L): (Uh, Up, Vey V4), g: true, 
Q: (|0): ( a) b> C3) a) = (Va, Vb, Vc, Va), |1): ( a> b; co a) n (Uh, Up, testa) 


In this simple example, the precondition P encodes an infinite number of quan- 
tum states, which is not expressible using the technique in [14]. We also included 
a buggy version by altering one of the H gates, and AUTOQ managed to detect 
the injected bug. The results can be found in rows H? in Table 1. 


5.2 Zero Imaginary Part of Amplitudes 


One property, which is shared by multiple algorithms, e.g., Bernstein- 
Vazirani’s [8] and Grover’s algorithm [25], is that the imaginary part of all 
amplitudes of the result is zero. 

Let us focus on Bernstein-Vazirani’s algorithm [8], which finds a secret bit- 
string s from an oracle using a single query. The algorithm begins with the quan- 
tum state |0”), where n is the length of s, and ends with the quantum state |s}. 
The amplitudes of all basis states are either zero or one, the imaginary part of the 
amplitudes is, therefore, always zero. For a three-qubit circuit C implementing the 
algorithm, we can therefore use the specification: {P, 4} C {Q} with 


P: (|000): (1,0,0,0), |x): (0,0,0,0)), yp: true, Q: (|*): Yim), 


where Wim = (O = -O4 AO. = 0) (it will also be used later). In the definition 
of P, recall that we use the integer-quadruple representation of complex numbers 
(cf. Eq. (5)). In the postcondition Q, the free variables v, Lc, Ha are to be 
substituted by the corresponding terms in the obtained ieder term quadruple 
(a,b,c,d) in the entailment check. Note that (a,b,c,d) represents the complex 
number (a +4 bv d¥2) 4 (bx c+ d¥2) (obtained from Eq. (5)). Because 
a,b,c,d are all integers, for the imaginary part to be zero, it must hold that 
c=0 and b= —d. 

When we run C from P, we obtain a TA R encoding (|010): (1,0, 0,0), 
|x): (0,0,0,0)) and the entailment R =p Q holds. See the rows BV(n) in Table 1 
for the results of verifying the algorithm for circuits with secrets of size n. As in the 
previous example, we also included a buggy version to demonstrate AUTOQ’s bug- 
finding capability. We can see that AUTOQ could verify the algorithm for secrets 
of a quite large size. 


5.3 Probability of Measuring the Correct Answer 


Grover’s algorithm [25] assumes a Boolean function f over n bits with only one 
satisfying assignment s and an oracle that evaluates f for a given input. The 
algorithm finds s with a high probability, say > 0.9, using only O(./2”) oracle 
queries. The algorithm works iteratively, where each Grover iteration queries the 
oracle once and amplifies the amplitude of |s). First, let C be a 6-qubit circuit 
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Table 1. Results of verifying our use cases with AUTOQ. The maximum peak memory 
consumption was 52 MiB for Groverau(9). In most cases, the time of entailment was 
negligible, with the exception of Groveray circuits. For instance, Grover, (8) takes 
2m18s for entailment checking (70% of the total time) and Groveran(9) takes 21 m36 s 
for entailment checking (85% of the total time). 


circuit qubits gates property result time circuit qubits gates property result time 
a 1 2 H=l OK 0.22s 6 54 P(Correct) > 0.9 OK 0.34s 
H? (bug) 1 2 Wel Bug 0.17s 32 28,159 P(Correct) > 0.9 OK  2m2ls 
BV(2) 2 6 Pim OK 0.11s ngle 36 63,537 P(Correct) > 0.9 OK 6m37s 
BV(2) (bug) 2 6 ttm Bug 0.15s GroveTsingle 40 141,527 P(Correct) > 0.9 OK 19m57s 
BV(100) 100 251 Pim OK 10.90s GTOVETIter 3 13 P(Correct) Increased OK 0.40s 
BV(1,000) 1,000 2,500 Pim OK 198m28s GTOVETIter 36 157 P(Correct) Increased OK 1.95s 
Groveran(3) 9 64 P(Correct) > 0.9 OK 0.40s  Grovertrer(5 100 445 P(Correct) Increased OK 47.76s 
Groveran(8) 24 939 P(Correct) > 0.9 OK 3m18s Groverjter(75 150 671 P(Correct) Increased OK 3m29s 
Grover au(9) 27 1,492 P(Correct) > 0.9 OK  25m16s Grover iter 200 895 P(Correct) Increased OK  10m53s 


implementing Grover’s search with the satisfying assignment s = 010, where the 
first three qubits of C are the work tape, and the following three are the ancillae. 
We use the following specification: 


> 


P : (|000000): T, |x): 0) where I = (1,0,0,0) and 0 = (0,0,0,0), y: true, 
Q: (|010): |Oa|? > 0.9 A dim, |*): [Oa]? < 0.1 A Yim) ® (1000): T, jx): 6). 


Note that the postcondition Q also checks that all amplitudes in the result of 
the algorithm have a zero imaginary part (using īm). See rows Groversingle(7) 
in Table 1 for the results on circuits for n-bit functions f and a single oracle. 

Next, we also show the correctness of Grover’s algorithm w.r.t. all possible 
3-qubit oracles. Let C’ be a 9-qubit circuit implementing the algorithm, where 
the first three qubits are used for oracle generation, and the following six are the 
work tape and ancillae, similarly to Grovergingie. Our specification is now 


P: 
Q: 


i € {0,1}*: (|2000000): T, |x): 0), p: true, 
i € {0,1}*: (lè): T, |x): 0) @ (fi): [Oa]? > 0.9 A Yim, [*): [Oa]? < 0.1 A Yim) 


= 


& (|000): T, |x): 0). 


Note that in the postcondition, we use 7 to relate the oracle value and the value 
on the work tape. The results are in rows Groveray(n) in Table 1. 


5.4 Increasing Amplitude of the Correct Answer 


Above, we show that we are able to automatically verify moderate-sized circuits 
for Grover’s algorithm for the values of n up to 9 (for Groveran) and 20 (for 
Groversingie), Which is quite large, but have difficulties going beyond that. The 
size of the circuit is O(./2”), which is quite large. Therefore, we also verify the 
algorithm w.r.t. a weaker property, which is, that in one iteration, the amplitude 
of the correct answer will increase. 

Consider a function f over 2 bits with 01 being the only satisfying assignment 
and let C be a 4-qubit circuit encoding one Grover iteration, with two qubits as 
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the work tape and two ancilla qubits. From Grover’s correctness proof [25], we 
can derive that when vg > OA vp, > OA(2"—1)ue > vp, a correct implementation 
will increase the probability of |01) and reduce others. We specify the verification 
problem as follows: 

P: (101): (vn,0,0,0), |): (ve,0,0,0)) & (00): T, |*): 0), 

p: ve >Ap > 0A (2? — 1)we > Uh; 


Q: (101): |Da] > Jon A Ym: |*): [Gal < [vel A Yim) 8 (100): T, |x): 0) 


The results can be found in rows Groverjjer(n) in Table 1. We can see that veri- 
fication of one Grover iteration w.r.t. the weaker (but still quite useful) property 
scales much better than verification of full Grover’s circuits, scaling to sizes of 
n > 100. 


6 Conclusion 


We presented a specification language for specifying useful properties of quantum 
circuits and a tool AUTOQ that can establish the correctness of the specification 
using an approach combining the technique from [14] with symbolic execution. 
Using the tool, we were able to fully automatically verify several important 
properties of a selection of quantum circuits. To the best of our knowledge, 
for some of the properties, we are the first ones that could verify them fully 
automatically. 
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Abstract. Zero Knowledge Proofs (ZKPs) are cryptographic protocols 
by which a prover convinces a verifier of the truth of a statement without 
revealing any other information. Typically, statements are expressed in 
a high-level language and then compiled to a low-level representation on 
which the ZKP operates. Thus, a bug in a ZKP compiler can compro- 
mise the statement that the ZK proof is supposed to establish. This paper 
takes a step towards ZKP compiler correctness by partially verifying 
a field-blasting compiler pass, a pass that translates Boolean and bit- 
vector logic into equivalent operations in a finite field. First, we define 
correctness for field-blasters and ZKP compilers more generally. Next, 
we describe the specific field-blaster using a set of encoding rules and 
define verification conditions for individual rules. Finally, we connect the 
rules and the correctness definition by showing that if our verification 
conditions hold, the field-blaster is correct. We have implemented our 
approach in the CirC ZKP compiler and have proved bounded versions 
of the corresponding verification conditions. We show that our partially 
verified field-blaster does not hurt the performance of the compiler or its 
output; we also report on four bugs uncovered during verification. 


1 Introduction 


Zero-Knowledge Proofs (ZKPs) are powerful tools for building privacy-pre- 
serving systems. They allow one entity, the prover P, to convince another, the 
verifier V, that some secret data satisfies a public property, without revealing 
anything else about the data. ZKPs underlie a large (and growing!) set of criti- 
cal applications, from billion-dollar private cryptocurrencies, like Zcash [24,53] 
and Monero [2], to research into auditable sealed court orders [20], private gun 
registries [26], privacy-preserving middleboxes [23], and zero-knowledge proofs 
of exploitability [11]. This breadth of applications is possible because of the gen- 
erality of ZKPs. In general, P knows a secret witness w, whereas V knows a 
property @ and a public instance x. P must show that ¢(a,w) = T. Typically, 
x and w are vectors of variables in a finite field F, and ¢ can be any system of 
equations over the variables, using operations + and x. Because ¢ itself is an 
© The Author(s) 2023 
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input to P and VY, and because of the expressivity of field equations, a single 
implementation of P and V can serve many different purposes. 

Humans find it difficult to express themselves directly with field equations, 
so they use ZKP compilers. A ZKP compiler converts a high-level predicate ¢’ 
into an equivalent system of field equations ¢. In other words, a ZKP compiler 
generalizes a ZKP: by compiling ¢’ to ¢ and then using a ZKP for ¢, one obtains 
a ZKP for ¢’. There are many industrial [3,5,6,14,21,45,55,66] and academic 
[4, 18, 28, 29,46, 48, 50, 54,63] ZKP compilers. 

The correctness of a ZKP compiler is critical for security— a bug in the 
compiler could admit proofs of false statements— but verification is challenging 
for three reasons. First, the definition of correctness for a ZKP compiler is non- 
trivial; we discuss later in this section. Second, ZKP compilers span multiple 
domains. The high-level predicate ¢’ is typically expressed in a language with 
common types such as Booleans and fixed-width integers, while the output ¢ is 
over a large, prime-order field. Thus, any compiler correctness definition must 
span these domains. Third, ZKP compilers are evolving and performance-critical; 
verification must not inhibit future changes or degrade compiler performance. 

In this work, we develop tools for automatically verifying the field-blaster of 
a ZKP compiler. A ZKP compiler’s field-blaster is the pass that converts from a 
formula over Booleans, fixed-width integers, and finite-field elements, to a system 
of field equations; as a transformation from bit-like types to field equations, the 
field-blaster exemplifies the challenge of cross-domain verification. 

Our paper makes three contributions. First, we formulate a precise correct- 
ness definition for a ZKP compiler. Our definition ensures that a correct compiler 
preserves the completeness and soundness of the underlying ZK proof system. 
More specifically, given a ZK proof system where statements are specified in a 
low-level language L, and a compiler from a high-level language H to L, if the 
compiler is correct by our definition, it extends the ZK proof system’s soundness 
and completeness properties to statements in H. Further, our definition is pre- 
served under sequential composition, so proving the correctness of each compiler 
pass individually suffices to prove correctness of the compiler itself. 

Second, we give an architecture for a verifiable field-blaster. In our architec- 
ture, a field-blaster is a set of “encoding rules.” We give verification conditions 
(VCs) for these rules, and we show that if the VCs hold, then the field-blaster 
is correct. Our approach supports automated verification because (bounded ver- 
sions of) the VCs can be checked automatically. This reduces both the up-front 
cost of verification and its maintenance cost. 

Third, we do a case study. Using our architecture, we implement a new 
field-blaster for CirC [46] (“SIR-see”), an infrastructure used by state-of-the- 
art ZKP compilers. We verify bounded versions of our field-blaster’s VCs using 
SMT-based finite-field reasoning [47], and show that our field blaster does not 
compromise CirC’s performance. We also report on four bugs that our verifica- 
tion effort uncovered, including a soundness bug that allowed the prover to “lie” 
about the results of certain bit-vector comparisons. We note that the utility of 


1 Roughly speaking, a ZK proof system is complete if it is possible to prove every true 
statement, and is sound if it is infeasible to prove false ones. 
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our techniques is not limited to CirC: most ZKP compilers include something 
like the field-blaster we describe here. 

In the next sections, we discuss related work (Sect. 1.1), give background on 
ZKPs and CirC (Sect. 2), present a field-blasting example (Sect. 3), describe our 
architecture (Sect.4), give our verification conditions (Sect.5), and present the 
case study (Sect. 6). 


1.1 Related Work 


Verified Compilers. There is a rich body of work on verifying the correctness of 
traditional compilers. We focus on compilation for ZKPs; this requires different 
correctness definitions that relate bit-like types to prime field elements. In the 
next paragraphs, we discuss more fine-grained differences. 

Compiler verification efforts fall into two broad categories: automated—verif- 
ication leveraging automated reasoning solvers—and foundational—manual ver- 
ification using proof assistants (e.g., Coq [8] or Isabelle [44]). CompCert [36], 
for example, is a Coq-verified C compiler with verified optimization passes 
(e.g., [40]). Closest to our work is backend verification, which proves correct the 
translation from an intermediate representation to machine code. CompCert’s 
lowering [37] is verified, as is CakeML’s [31] lowering to different ISAs [19,57]. 
While such foundational verification offers strong guarantees, it imposes a heavy 
proof burden; creating CompCert, for example, took an expert team eight 
years [56], and any updates to compiler code require updates to proofs. 

Automated verification, in contrast, does not require writing and maintaining 
manual proofs.” Cobalt [34], Rhodium [35], and PEC [32] are domain-specific 
languages (DSLs) for writing automatically-verified compiler optimizations and 
analyses. Most closely related to our work is Alive [39], a DSL for expressing 
verified peephole optimizations, local rewrites that transform snippets of LLVM 
IR [1] to better-performing ones. Alive addresses transformations over fixed types 
(while we address lowering to finite field equations) and formulates correctness 
in the presence of undefined behavior (while we formulate correctness for ZKPs). 
Beyond Alive, Alive2 [38] provides translation validation [41,51] for LLVM [33], 
and VeRA [10] verifies range analysis in the Firefox JavaScript engine. 

There is also work on verified compilation for domains more closely related 
to ZKPs. The Porcupine [15] compiler automatically synthesizes representations 
for fully-homomorphic encryption [62], and Gillar [58] proves that optimization 
passes in the Qiskit [60] quantum compiler are semantics-preserving. While these 
works compile from high-level languages to circuit representations, the correct- 
ness definitions for their domains do not apply to ZKP compilers. 


Verified Compilation to Cryptographic Proofs. Prior works on verified compi- 
lation for ZKPs (or similar) take the foundational approach (with attendant 
proof maintenance burdens), and they do not formulate a satisfactory defini- 
tion of compiler correctness. PinocchioQ [18] builds on CompCert [36]. The 


2 Automated verification generally leverages solvers. This is a particularly appealing 
approach in our setting, since CirC (our compiler infrastructure of interest) already 
supports compilation to SMT formulas. 
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authors formulate a correctness definition that preserves the existential sound- 
ness of a ZKP but does not consider completeness, knowledge soundness, or 
zero-knowledge (see Sect. 2.2). Leo [14] is a ZKP compiler that produces (partial) 
ACL2 [27] proofs of correct compilation; work to emit proofs from its field-blaster 
is ongoing. 

Recent work defines security for reductions of knowledge [30]. These let P 
convince VY that it knows a witness for an instance of relation R, by proving it 
knows a witness for an instance of an easier-to-prove relation Ra. Unlike ZKP 
compilers, P and V interact to derive Rə using V’s randomness (e.g., proving that 
two polynomials are nonzero w.h.p. by proving that a random linear combination 
of them is), whereas ZKP compilers run ahead of time and non-interactively. 

Further afield, Ecne [65] is a tool that attempts to verify that the input to 
a ZKP encodes a deterministic computation. It does not consider any notion 
of a specification of the intended behavior. A different work [25] attempts to 
automatically verify that a “widget” given to a ZKP meets some specification. 
They consider widgets that could be constructed manually or with a compiler. 
Our focus is on verifying a compiler pass. 


2 Background 


2.1 Logic 


We assume usual terminology for many-sorted first-order logic with equality ( 
[17] gives a complete presentation). We assume every signature includes the sort 
Bool, constants True and False of sort Bool, and symbol family ~, (abbreviated 
=) with sort ø x ø — Bool for each sort ø. We also assume a family of condi- 
tionals: symbols ite, (“if-then-else”, abbreviated ite) of sort Bool x o x o > a. 

A theory is a pair T = (X,I), where X is a signature and I is a class of X- 
interpretations. A X-formula is a term of sort Bool. A X-formula ¢ is satisfiable 
(resp., unsatisfiable) in T if it is satisfied by some (resp., no) interpretation 
in I. We focus on two theories. The first is Tgy, the SMT-LIB theory of bit- 
vectors [52,61], with signature Xgy including a bit-vector sort BV, for each 
n > 0 with bit-vector constants Cim) of sort BV;,j for each c € [0,2” — 1], and 
operators including & and | (bitwise and, or) and +;,) (addition modulo 2”). We 
write t[i] to refer to the it bit of bit-vector t, where t[0] is the least-significant 
bit. The other theory is Tp, which is the theory corresponding to the finite field 
of order p, for some prime p [47]. This theory has signature Xp, containing the 
sort FF,, constant symbols 0,...,p —1, and operators + and x. 

In this paper, we assume all interpretations interpret sorts and symbols in 
the same way. We write dom(v) for the set interpreting the sort of a variable 
v. We assume that Bool, True, and False are interpreted as {T, L}, T, and 
L, respectively; Y’gy-interpretations follow the SMT-LIB standard; and Xp,- 
interpretations interpret symbols as the corresponding elements and operations 
in F,,, a finite field of order p (for concreteness, this could be the integers modulo 
p). Note that only the values of variables can vary between two interpretations. 

For a signature X, let t be a X-term of sort o, with free variables z1,..., £n, 
respectively of sort o1,...,0,. We define the function ê : dom(z1) x --+ x 


158 A. Ozdemir et al. 


pk Setup(¢) vk 
P(d,t,w)  _ > Ver) 


> 
Prove(pk, x, w) Verify(vk, x, 7) 


Fig. 1. The information flow for a zero-knowledge proof. 


dom(x,) — dom(t) as follows. Let x € dom(a 1) x --- x dom(z,,). Let M be an 
interpretation that interprets each z; as x;. Then f(x) = t™ (i.e., the interpreta- 
tion of tin M). For example, the term t = a/A-a defines Ê: Bool > Bool = Xa. L. 
In the following, we follow the convention used above in using the standard font 
(e.g., x) for logical variables and a sans serif font (e.g., x) to denote meta-variables 
standing for values (i.e., elements of o™ for some ø and M). Also, abusing nota- 
tion, we’ll conflate single variables (of both kinds) with vectors of variables when 
the distinction doesn’t matter. Note that a formula ¢ is satisfiable if there exist 
values x such that (x) = T. It is valid if for all values x, 6(x) = T. 

For terms s,t and variable x, tz ++ s| denotes t with all occurrences of x 
replaced with s. For a sequence of variable-term pairs, S = (£1 > 51,...,2n > 
Sn), t[S] is defined to be t[x1 +> s1] +--+ [En Sp]. 


2.2 Zero Knowledge Proofs 


As mentioned above, Zero-knowledge proofs (ZKPs) make it possible to prove 
that some secret data satisfies a public property—without revealing the data 
itself. See [59] for a full presentation; we give a brief overview here, and then 
describe how general-purpose ZKPs are used. 


Overview and Definitions. In a cryptographic proof system, there are two parties: 
a verifier V and a prover P. V knows a public instance x and asks P to show that 
it has knowledge of a secret witness w satisfying a public predicate ọ(x, w) from 
a predicate class ® (a set of formulas) (i.e., 6(x,w) = T). Figure 1 illustrates the 
workflow. First, a trusted party runs an efficient (i.e., polytime in an implicit 
security parameter A) algorithm Setup(¢) which produces a proving key pk and 
a verifying key vk. Then, P runs an efficient algorithm Prove(pk,x,w) — a and 
sends the resulting proof 7 to V. Finally, V runs an efficient verification algorithm 
Verify(vk,x,7) — {T, L} that accepts or rejects the proof. A zero-knowledge 
argument of knowledge for class ® is a tuple II = (Setup, Prove, Verify) with 
three informal properties for every ¢ € ® and every x € dom(),w € dom(w): 


— perfect completeness: if d(x, w) holds, then Verify(vk, x, 7) holds; 

— computational knowledge soundness [9]: an efficient adversary that does not 
know w cannot produce a m such that Verify(vk, x, 7) holds; and 

— zero-knowledge [22]: 7 reveals nothing about w, other than its existence. 


Technically, the system is an “argument” rather than a “proof” because sound- 
ness only holds against efficient adversaries. Also note that knowledge soundness 
requires that an entity must “know” a valid w’ to produce a proof; it is not enough 
for a valid w’ to simply exist. We give more precise definitions in Appendix A. 
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Representations for ZK Ps. As mentioned above, ZKP applications are manifold 
(Sect. 1)—from cryptocurrencies to private registries. This breadth of applica- 
tions is possible because ZKPs support a broad class of predicates. Most com- 
monly, these predicates are expressed as rank-1 constraint systems (R1CSs). 
Recall that F, is a prime-order finite field (also called a prime field). We will 
drop the subscript p when it is not important. In an R1CS, x and w are vectors of 
elements in F; let z € F™ be their concatenation. The function d can be defined 
by three matrices A, B, C € F"*™; (x, w) holds when AzoBz = Cz, where o is the 
element-wise product. Thus, ¢ can be viewed as n conjoined constraints, where 
each constraint i is of the form (X; aijzj) x (D0; bizjzj) = (D0; cijzj) (where 
the aij, bij and cj; are constant symbols from &Xp,, and the zj are a vector 
of variables of sort FF,). That is, each constraint enforces a single non-linear 
multiplication. 


2.3 Compilation Targeting Zero Knowledge Proofs 


To write a ZKP about a high-level predicate ¢, that predicate is first compiled to 
an RICS. A ZKP compiler from class ® (a set of X-formulas) to class ©’ (a set 
of &”-formulas) is an efficient algorithm Compile(¢ E€ &) — (¢’ € P', Exts, Ext). 
Given a predicate ¢(x, w), it returns a predicate ¢'(x’, w’) as well as two efficient 
and deterministic algorithms, instance and witness extenders: Ext, : dom(x) > 
dom(’) and Ext,, : dom(x) x dom(w) — dom(w’).? For example, CirC [46] can 
compile a Boolean-returning C function (in a subset of C) to an R1CS. 

At a high-level, @ and ¢’ should be “equisatisfiable”, with Ext, and Ext, 
mapping satisfying values for ¢ to satisfying values for ¢’. That is, for all x € 
dom(x) and w € dom(w) such that $(x,w) = T, if x’ = Ext,(x) and w = 
Ext. (x, w), then &' (x’, w’) = T. Furthermore, for any x, it should be impossible to 
(efficiently) find w’ satisfying 4'(Ext,(x),w’) = T without knowing a w satisfying 
d(x, w) = T. In Sect. 5.1, we precisely define correctness for a predicate compiler. 

One can build a ZKP for class ® from a compiler from ® to P and a ZKP for 
®'. Essentially, one runs the compiler to get a predicate ¢’ € P', as well as Ext, 
and Ext,,. Then, one writes a ZKP to show that @/(Ext, (x), Ext,,(x,w)) = T. In 
Appendix A, we give this construction in full and prove it is secure. 


Optimization. The primary challenge when using ZKPs is cost: typically, Prove 
is at least three orders of magnitude slower than checking ¢ directly [64]. Since 
Prove’s cost scales with n (the constraint count), it is critical for the compiler 
to minimize n. The space of optimizations is large and complex, for two reasons. 
First, the compiler can introduce fresh variables. Second, only equisatifiability— 
not logical equivalence—is needed. Compilers in this space exploit equisatisfia- 
bility heavily to efficiently represent high-level constructs (e.g., Booleans, bit- 
vectors, arrays, ...) as an RICS. 


3 For technical reasons, the runtime of Ext, and the size of its description must be 
poly(A, |z|)—not just poly(A) (Appendix A). . 
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(1) (2) (3) 


pgm — front-end H> IR > --: — IR[Y’pv U Xr] — lowering }> R1CS 


field-blasting > IR[X’r] > flattening 


Fig. 2. The architecture of CirC 


As a (simple!) example, consider the Boolean computation a ~% c1 V -++ V Ck. 
Assume that cj,...,cj, are variables of sort FF and that we add constraints 
c\(1 — c) = 0 to ensure that ci has to be 0 or 1 for each i. Assume further 
that (c, ~ 1) encodes c; for each i. How can one additionally ensure that a’ 
(also of sort FF) is also forced to be equal to 0 or 1 and that (a’ = 1) is a 
correct encoding of a? Given that there are k — 1 ORs, natural approaches use 
O(k) constraints. One clever approach is to introduce variable x’ and enforce 
constraints x/(>>,c,) ~ a’ and (1 — a')(X 2; c;) ~ 0. In any interpretation where 
any c; is true, the corresponding interpretation for a’ must be 1 to satisfy the 
second constraint; setting x’ to the sum’s inverse satisfies the first. If all c; are 
false, the first constraint ensures a’ is 0. This technique assumes the sum does 
not overflow; since ZKP fields are typically large (e.g., with p on the order of 
2255), this is usually a safe assumption. 


CirC. CirC [46] is an infrastructure for building compilers from high-level lan- 
guages (e.g., a C subset), to R1CSs. It has been used in research projects [4, 12], 
and in industrial R&D. Figure 2 shows the structure of an R1CS compiler built 
with CirC. First, the front-end of the compiler converts the source program 
into CirC-IR. CirC-IR is a term IR based on SMT-LIB that includes: Booleans, 
bit-vectors, fixed-size arrays, tuples, and prime fields. Second, the compiler 
optimizes and simplifies the IR so that the only remaining sorts are Booleans, 
bit-vectors, and the target prime field. Third, the compiler lowers the simplified 
IR to an R1CS predicate over the target field. For ZKPs built with CirC, the 
completeness, soundness, and zero-knowledge of the end-to-end system depend 
on the correctness of CirC itself. 


3 Overview and Example 


To start, we view CirC’s lowering pass as two passes (Fig. 2). The first pass, 
“(finite-)field-blasting,” converts a many-sorted IR (representable as a (Xigy U 
X r)-formula) to a conjunction of field equations (2’r-equations). The second 
pass, “flattening,” converts this conjunction of field equations to an R1CS. 

Our focus is on verifying the first pass. We begin with a worked example 
of how to field-blast a small snippet of CirC-IR (Sect.3.1). This example will 
illustrate four key ideas (Sect. 3.2) that inspire our field-blaster’s architecture. 


4 We list all CirC-IR operators for Booleans, bit-vectors, and prime fields in 
Appendix C. Almost all are from SMT-LIB. 
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Table 1. New variables and assertions when compiling the example @. 


clause term from ¢ assertions new variables notes 
$ 
al To To 
f / z 
Wo wo (wo = 1) z0 Wo 
xo © Wo 1 ~ 1— wo — 20 + 2wo2x 
2 Tı Tiu 
1 / / 0 
Wi wii (wi, — 1) 80 Wii i € [0,3 
$ 1 3 bcd f 
Ti tawi 8 X Tiu + Dia 2 Wii 5 
si(s;—1) &0 si ie [0,4 
Í au 4 iol 
S o pea 2" 8; 
/ / . 
zı Hja] wi S wi Si Wii i € [0,3 
3 T2 Lou 
x2 (bits) £5; (v5, — 1) © 0 Thi i € [0,3 
f P 3 $a 
T2 u © Xa 2 T2 i 
1 f / P 
ae & wi & T2 Ly Wij © Lo; i € [0,3 
A 1 
4 T3, W2 T3, W2 
£3 © we X we r3 Wh X Wh 


3.1 An Example of Field-Blasting 


We start with an example CirC-IR predicate expressed as a (X gy UX p)-formula: 
ġo £ (zo ® wo) A (wr +j] £1 S w1) A (z2 & wi & z2) A (£3 X wa X w2) (1) 


The predicate includes: the XOR. of two Booleans (“9”), a bit-vector sum, a bit- 
vector AND, and a field product. xo and wo are of sort Bool, 71, x2, and w1 are 
of sort BVj4), and x3 and wə are of sort FF,. We’ll assume that p > 24, Table 1 
summarizes the new variables and assertions we create during field-blasting; we 
describe the origin of each assertion and new variable in the next paragraphs. 


Lowering Clause One (Booleans). We begin with the Boolean term (xo © wo). 
We will use 1 and 0 to represent T and L. We introduce variables 7 and wo of 
sort FF, to represent xo and wo respectively. To ensure that wọ is 0 or 1, we assert: 
wo(wo—1) ~ 0.° xo @wo is then represented by the expression 1— xh — wp +2xhwh. 
Setting this equal to 1 enforces that zo © wọ must be true. These new assertions 
and fresh variables are reflected in the first three rows of the table. 


Lowering Clause Two and Three (Bit-vectors). Before describing how to bit- 
blast the second and third clauses in ¢, we discuss bit-vector representations in 


5 Later (Sect. 5), we will see that “well-formedness” constraints like this are unnecessary 
for instance variables, such as Zo. . 
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general. A bit-vector t can be viewed as a sequence of b bits or as a non-negative 
integer less than 2°. These two views suggest two natural representations in a 
prime-order field: first, as one field element t/,, whose unsigned value agrees with 
t (assuming the field’s size is at least 2°); second, as b elements t,...,t)_1, 
that encode the bits of t as 0 or 1 (in our encoding, tọ is the least-significant 
bit and t),_, is the most-significant bit). The first representation is simple, but 
with it, some field values (e.g., 2”) don’t corresponding to any possible bit-vector. 
With the second approach, by including equations t; (t;—1) ~ 0 in our system, we 
ensure that any satisfying assignment corresponds to a valid bit-vector. However, 
the extra b equations increase the size of our compiler’s output. 

We represent ¢’s w; bit-wise: as wi o,- - ., W1,3, and we represent the instance 
variable zı as Mage For the constraint wı +14) 21 © wi, we compute the sum 
in the field and bit-decompose the result to handle overflow. First, we introduce 
new variable s’ and set it equal to x „ + Da 2'wi ;. Then, we bit-decompose 


s’, requiring s’ ~% Eio 2's’, and s/(s — 1) ~ 0 for i € [0,4]. Finally, we assert 


s; ~ w}; for i € [0,3]. This forces the lowest 4 bits of the sum to be equal to w1. 
The constraint £2 & wı ~% 22 is more challenging. Since x2 is an instance 
variable, we initially encode it as £3 „.- Then, we consider the bit-wise AND. 
There is no obvious way to encode a bit-wise operation, other than bit-by- 
bit. So, we convert z% ,, to a bit-wise representation: We introduce witness 
variables x 9,---,%3 and equations x} ;(@); — 1) ~ 0 as well as equation 
Lau © Ti 2'xh ;. Then, for each i we require £h ;W1 ; © Xj. 
Lowering the Final Clause (Field Elements). Finally, we consider the field equa- 
tion £2 % we X w2. Our target is also field equations, so lowering this is straight- 
forward. We simply introduce primed variables and copy the equation. 


3.2 Key Ideas 
This example highlights four ideas that guide the design of our field-blaster: 


1. fresh variables and assertions: Field-blasting uses two primitive operations: 
creating new variables in ¢’ (e.g., wi to represent wo) and adding new asser- 
tions to ¢! (e.g., wo(wo — 1) ~ 0). 

2. encodings: For a term t in ¢, we construct a field term (or collection of field 
terms) in ¢’ that represent the value of t. For example, the Boolean wo is 
represented as the field element wọ that is 0 or 1. 

3. operator rules: if t is an operator applied to some arguments, we can encode 
t given encodings of the arguments. For example, if t is zo ® wo, and zo is 
encoded as zg and wo as wo, then t can be encoded as 1 — zo — wo + 2xpwg. 

4. conversions: Some sorts can be represented by encodings of different kinds. 
If a term has multiple possible encodings, the compiler may need to convert 
between them to apply some operator rule. For example, we converted x2 
from an unsigned encoding to a bit-wise encoding before handling an AND. 


6 We represent w1 bit-wise so that we can ensure the representation is well-formed with 
constraints wii (wij —1) = 0. As previously noted, such well-formedness constraints 
are not needed for an instance variable like z1.(See footnote 5). 
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Table 2. Encodings for each term sort. Only bit-vectors have two encoding kinds. 


Variant Contents Semantics 

encoded term | kind | terms Validity Condition 

t: Bool bit |f f ~ ite(t, 1,0) 

t: BVip) uint |f f~ Dd, ite(t[i] ~ 1p), 2*,0) 
t: BVpy bits | fo,...,fo-1| A, Ji ~ ite(tli] ~ 1p), 1,0) 
t: FF field |f taf 


4 Architecture 


In this section, we present our field-blaster architecture. To compile a predicate 
@ to a system of field equations ¢’, our architecture processes each term t in ¢ 
using a post-order traversal. Informally, it represents each t as an “encoding” in 
¢’: a term (or collection of terms) over variables in ¢’. Each encoding is produced 
by a small algorithm called an “encoding rule”. 

Below, we define the type of encodings Enc (Sect. 4.1), the five different types 
of encoding rules (Sect. 4.2), and a calculus that iteratively applies these rules 
to compile all of ¢ (Sect. 4.3). 


4.1 Encodings 


Table 2 presents our tagged union type Enc of possible term encodings. Each vari- 
ant comprises the term being encoded, its tag (the encoding kind), and a sequence 
of field terms. The encoding kinds are bit (a Boolean as 0/1), uint (a bit-vector as 
an unsigned integer), bits (a bit-vector as a sequence of bits), and field (a field 
term trivially represented as a field term). Each encoding has an intended seman- 
tics: a condition under which the encoding is considered valid. For instance, a bit 
encoding of Boolean t is valid if the field term f is equal to tte(t, 1,0). 


4.2 Encoding Rules 


An encoding rule is an algorithm that takes and/or returns encodings, in order 
to represent some part of the input predicate as field terms and equations. 


Primitive Operations. A rule can perform two primitive operations: creating 
new variables and emitting assertions. In our pseudocode, the primitive func- 
tion fresh(name, t,isInst) — wx’ creates a fresh variable. Argument isInst is a 
Boolean indicating whether x’ is an instance variable (as opposed to a witness). 
Argument t is a field term (over variables from ¢ and previously defined primed 
variables) that expresses how to compute a value for x’. For example, to cre- 
ate a field variable w’ that represents Boolean witness variable w, a rule can 
call fresh(w’, ite(w,1,0), L). The compiler uses t to help create the Ext, and 
Ext,, algorithms. A rule asserts a formula ¢’ (over primed variables) by calling 
assert(t’). 
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fn variable(t, isInst) — Enc : 


if isInst: fn assertEq(e : Enc, e’ : Enc) : 
t < fresh(name(t) || <u’, if kind(e) = bits: 
X, ite(t[é] ~ 1p), 2°, 0), T) for i in [0, size(terms(e)) — 1]: 
return t, uint, t’ assert(terms(e)|[z] ~ terms(e’)|#]) 
else: elif kind(e) = uint: 
for i in [0, size(sort(t)) — 1]: assert(terms(e) [0] ~ terms(e’)[0]) 
t, + fresh(name(é) || i, 
ite(tli] © 1p), 1,0), L) fn convert(e : Enc, kind’ : Kind) — Enc: 
assert(t; (t; — 1) = 0) t < encoded _term(e) 
return t, bits, tj,..., tive(son())—4 if kind(e) = bits and kind’ = uint: 
return t, uint, }>, 2*terms(e)[i] 
fn const(t) > Enc : elif kind(e) = uint and kind’ = bits: 
for i in [0, size(sort(t)) — 1]: e’ + variable(t, L) 
t; < ite(tli] ~ 1p), 1,0) assert(terms(e) [0] ~ >>, 2°terms(e’) [Z]) 
return t, bits, to,- -- , tize(sort(t))—1 return e’ 


Fig. 3. Pseudocode for some bit-vector rules: variable uses a uint encoding for instances 
and bit-splits witnesses to ensure they’re well-formed, const bit-splits the constant it’s 
given, assertEq asserts unsigned or bit-wise equality, and convert either does a bit-sum 
or bit-split. 


Rule Types. There are five types of rules: (1) Variable rules variable(t, isInst) > 
e take a variable ¢ and its instance/witness status and return an encoding of 
that variable made up of fresh variables. (2) Constant rules const(t) — e take a 
constant term t and produce an encoding of t comprising terms that depend only 
on t. Since t is a constant, the terms in e can be evaluated to field constants (see 
the calculus in Sect. 4.3).’ The const rule cannot call fresh or assert. (3) Equality 
rules assertEq(e, e’) take two encodings of the same kind and emit assertions that 
equate the underlying terms. (4) Conversion rules convert(e, kind’) — e’ take an 
encoding and convert it to an encoding of a different kind. Conversions are only 
non-trivial for bit-vectors, which have two encoding kinds: uint and bits. (5) 
Operator rules apply to terms t of form o(t1,..., tn). Each operator rule takes t, 
o, and encodings of the child terms t; and returns an encoding of t. Some operator 
rules require specific kinds of encodings; before using such an operator rule, our 
calculus (Sect. 4.3) calls the convert rule to ensure the input encodings are the 
correct kind. Figure 3 gives pseudocode for the first four rule types, as applied 
to bit-vectors. Figure 4 gives pseudocode for two bit-vector operator encoding 
rules. A field blaster uses many operator rules: in our case study (Sect. 6) there 
are 46. 


T Having const(t) return terms that depend on t (rather than directly returning con- 
stants) is useful for constructing verification conditions for const. 
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fn bvZeroExt(t, o : Op,e: Enc) : fn bvMulUint(é, o : Op, € : [Enc]) : 
if kind(e) = bits: w + size(sort(encoded_term(e[0]))) 
w + size(terms(e)) W < size(é) x w 
for i in [0, w — 1]: assume W < |log, p| 
t; < terms(e) [7] s' + J], terms(e;) [0] 
for i in [0, o.newBits — 1]: b + ££2bv(W, s’) 
twi — 0 for i in [0,W — 1]: 
return t, bits, to,...,tiy4o.newBits—1 t; + fresh(2, ite(b[z], 1,0), L) 
else: assert(ti (ti; — 1) ~ 0) 
return t, kind(e), terms(e) assert(s’ ~ S71” 5* 2 t) 
return t, bits, to,...,t—1 


Fig. 4. Pseudocode for some bit-vector operator rules. bvZeroExt zero-extends a bit- 
vector; for bit-wise encodings, it adds zero bits, and for unsigned encodings, it sim- 
ply copies the original encoding. bvMulUint multiplies bit-vectors, all assumed to be 
unsigned encodings. We show only the case where the multiplication cannot overflow 
in the field: in this case the rule performs the multiplication in the field, and bit-splits 
the result to implement reduction modulo 2°. The rules use f££2bv, which converts from 
a field element to a bit-vector (discussed in Sect. 6.1). 


4.3 Calculus 


We now give a non-deterministic calculus describing how our field-blaster applies 
rules to compile a predicate ¢(2, w) into a system of field equations. 

A calculus state is a tuple of three items: (E£, A, F). The encoding store E is 
a (multi-)map from terms to sets of encodings. The assertions formula A is a 
conjunction of all field equations asserted via assert. The fresh variable definitions 
sequence F is a sequence consisting of pairs, where each pair (v,t) matches a 
single call to fresh(v,t,...). 

Figure 5 shows the transitions of our calculus. We denote the result of a rule 
as A’, F',e' —r(...), where A’ is a formula capturing any new assertions, F” is 
a sequence of pairs capturing any new variable definitions, and e’ is the rule’s 
return value. We may omit one or more results if they are always absent for a 
particular rule. For encoding store E, EU (t + e) denotes the store with e added 
to ts encoding set. 

There are five kinds of transitions. The Const transition adds an encoding 
for a constant term. The const rule returns an encoding e whose terms depend 
on the constant c; e’ is a new encoding identical to e, except that each of its 
terms has been evaluated to obtain a field constant. The Var transition adds an 
encoding for a variable term. The Conv transition takes a term that is already 
encoded and re-encodes it with a new encoding kind. The kinds operator returns 
all legal values of kind for encodings of a given sort. The Op, transition applies 
operator rule r. This transition is only possible if r’s operator kind agrees with o, 
and if its input encoding kinds agree with €. The Finish transition applies when ¢ 
has been encoded. It uses const and assertEq to build assertions that hold when 
ġo = T. Rather than producing a new calculus state, it returns the outputs of 
the calculus: the assertions and the variable definitions. 
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constant term c e + const(c) e’ + map(eval, e) 
E:= EU(c e’) 


Const 


variable term v A’, F',e 4+ variable(v, isInst(v)) ý 
E:=EU(w>e), A:=A^AA', F:=F|P 
(trHe)EE kind € kinds(sort(¢)) A’, F’,e’ + convert(e, kind) 
7 7 7 Conv 
E:=EUu(te e), A:=AAA, F:=F|F 
(tim ei) € E t = o(€) A’, F’,e' — r(t,o,é) m 
E:=EU(t> e), A:i=AAA, F:=F || F' ii 
(m e)E E er + const(T) A’, F' & assertEq(e, er) ay 
inis 


return (AA A’, F || F’) 


Fig. 5. The transition rules of our rewriting calculus. 


To meet the requirements of the ZKP compiler, our calculus must return two 
extension function: Ext, and Ext,, (Sect. 2.2). Both can be constructed from the 
fresh variable definitions F. One subtlety is that Ext,(2) (which assigns values 
to fresh instance variables) is a function of x only—it cannot depend on the 
witness variables of ¢. We ensure this by allowing fresh instance variables to 
only be created by the variable rule, and only when it is called with isInst = T. 


Strategy. Our calculus is non-deterministic: multiple transitions are possible in 
some situations; for example, some conversion is almost always applicable. The 
strategy that decides which transition to apply affects field blaster performance 
(Appendix D) but not correctness. 


5 Verification Conditions 


In this section, we first define correctness for a ZKP compiler (Sect. 5.1). Then, 
we give verification conditions (VCs) for each type of encoding rule (Sect. 5.2). 
Finally, we show that if these VCs hold, our calculus is a correct ZKP compiler 
(Sect. 5.3). 


5.1 Correctness Definition 


Definition 1 (Correctness). A ZKP compiler Compile(¢) — (¢’, Exts, Ext.,) 
is correct if it is demonstrably complete and demonstrably sound. 


e demonstrable completeness: For all x € dom(x),w € dom(w) such that 


o(x,w) = T, 


' (Ext, (x), Ext, (x, w)) = T 
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e demonstrable soundness: There exists an efficient algorithm Inv(x’,w’) — w 
such that for all x € dom(x),w’ € dom(w’) such that ¢’(Ext,(x),w’) = T, 


o(x, Inv(Ext,(x), w’)) = T 


Demonstrable completeness (respectively, soundness) requires the existence 
of a witness for ¢’ (resp., ¢) when a witness exists for ¢ (resp., ¢’); this existence 
is demonstrated by an efficient algorithm Ext,, (resp., Inv) that computes the 
witness. 

Correct ZKP compilers are important for two reasons. First, since sequential 
composition preserves correctness, one can prove a multi-pass compiler is correct 
pass-by-pass. Second, a correct ZKP compiler from ® to & can be used to 
generalize a ZKP for ®' to one for P. We prove both properties in Appendix A. 


Theorem 1 (Compiler Composition). Jf Compile’ and Compile” are correct, 
then the compiler Compose(Compile’, Compile”) (Appendix A) is correct. 


Theorem 2 (ZKP Generalization). (informal) Given a correct ZKP com- 
piler Compile from © to P' and a ZKP for ©’, we can construct a ZKP for ®. 


5.2 Rule VCs 


Recall (Sect. 4) that our language manipulates encodings through five types of 
encoding rules. We give verification conditions for each type of rule. Intuitively, 
these capture the correctness of each rule in isolation. Next, we'll show that they 
imply the correctness of a ZKP compiler that follows our calculus. 

Our VCs quantify over valid encodings. That is, they have the form: “for any 
valid encoding e of term t, ...” We can quantify over an encoding e by making 
each t; € terms(e) a fresh variable, and quantifying over the t;. Encoding validity 
is captured by a predicate valid(e,t), which is defined to be the validity condi- 
tion in Table 2. Each VC containing encoding variables e implicitly represents a 
conjunction of instances of that VC, one for each possible tuple of kinds of e, 
which is fixed for each instance. If a VC contains valid(e,t), the sort of t is con- 
strained to be compatible with kind(e). For a kind and a sort to be compatible, 
they must occur in the same row of Table 2. We define the equality predicate 
equal(e,e’) as N; terms(e)[i] ~ terms(e’) |i]. 


Encoding Uniqueness. First, we require the uniqueness of valid encodings, for 
any fixed encoding kind. Table 3 shows the VCs that ensure this. Each row is a 
formula that must be valid, for all compatible encodings and terms. The first two 
rows ensure that there is a bijection from terms to their valid encodings (in the 
first row, we consider only instances for which kind(e) = kind(e’)). The function 
fromTerm(t, kind) — e maps a term and an encoding kind to a valid encoding of 
that kind, and the function toTerm(e) — t maps a valid encoding to its encoded 
term. The third and fourth rows ensure that fromTerm and toTerm are correctly 
defined. We will use toTerm in our proof of calculus soundness (Appendix B) 
and we will use fromTerm to optimize VCs for faster verification (Sect. 6.1). 
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Table 3. VCs related to encoding uniqueness. 


Property Condition 
valid encoding uniqueness | (valid(e,t) A valid(e’,t)) — equal(e, e’) 
valid encoding uniqueness | (valid(e,t) A valid(e, t’)) > t ~ t' 


fromTerm correctness valid (from Term(t, kind), t) 


toTerm correctness valid(e, toTerm(e)) 


Table 4. VCs for encoding rules. 


Rule Property Condition 
Operator | Sound A^ N; valid(e;, ti)) > valid(e’, o(t )) 
e’ —r.(e) | Complete ((A, valid (ei, ti)) > (AA valid(e’, o(t)))) [F] 
Equality Sound AN N; valid(e:,ti)) > (tı ~ t2) 
r=(e1,e2) | Complete (((ti = t2) A A; valid (ei, ti)) > A) [F] 
Conversion | Sound AA valid(e,t)) > valid (e', t) 
e’ — r_.(e) Complete (valid(e,t)) + (AA valid(e’, t))) [F] 
Variable Sound (t € w) | A > Gt’. valid(e’, t’) 

Sound (t € x) (A —> valid(e’, t)) [Fr] 
e’ —r,(t) | Complete AA valid(e’,t))[F] 
Constant |— valid(e, t) 
e — re(t) 


For an example of the valid, fromTerm, and toTerm functions, consider a 
Boolean b encoded as an encoding e with kind bit and whose terms consist 
of a single field element f. Validity is defined as valid(e,b) = f ~ ite(b,1,0), 
toTerm(f) is defined as f ~ 1, and fromTerm(b, bit) is (b, bit, ite(b, 1,0)). 


VCs for Encoding Rules. Table 4 shows our VCs for the rules of Fig. 5. For each 
rule application, A and F denote, respectively, the assertions and the variable 
declarations generated when that rule is applied. We explain some of the VCs 
in detail. 

First, consider a rule ro for operator o applied to inputs ¢,,...,t,. The rule 
takes input encodings €1,...,€g and returns an output e’. It is sound if the 
validity of its inputs and its assertions imply the validity of its output. It is 
complete if the validity of its inputs implies its assertions and the validity of its 
output, after substituting fresh variable definitions. 

Second, consider a variable rule. Its input is a variable term t, and it returns 
e', a putative encoding thereof. Note that e’ does not actually contain t, though 
the substitutions in F may bind the fresh variables of e’ to functions of t. For 
the rule to be sound when t is a witness variable (t € w), the assertions must 
imply that e’ is valid for some term t’. For the rule to be sound when t is an 
instance variable (t € x), the assertions must imply that e’ is valid for t, when 
the instance variables in e’ are replaced with their definition (Fẹ, denotes F, 
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restricted to its declarations of instance variables). For the variable rule to be 
complete (for an instance or a witness), the assertions and the validity of e’ for 
t must follow from F. 

Third, consider a constant rule. Its input is a constant term t, and it returns 
an encoding e. Recall that the terms of e are always evaluated, yielding e’ which 
only contains constant terms. Thus, correctness depends only on the fact that e 
is always a valid encoding of the input t. This can be captured with a single VC. 


5.3 A Correct Field-Blasting Calculus 


Given rules that satisfy these verification conditions, we show that the calculus 
of Sect. 4.3 is a correct ZKP compiler. The proof is in Appendix B. 


Theorem 3 (Correctness). With rules that satisfy the conditions of Sect. 5.2, 
the calculus of Sect. 4.3 is demonstrably complete and sound (Def. 1). 


6 Case Study: A Verifiable Field-Blaster for CirC 


We implemented and partially verified a field-blaster for CirC [46]. Our imple- 
mentation is based on a refactoring of CirC’s original field blaster to conform 
to our encoding rules (Sect. 4.2) and consists of ~850 lines of code (LOC).? As 
described below, we have (partially) verified our encoding rules, but trust 
our calculus (Sect. 4.3, #150 LOC) and our flattening implementations (Fig. 2, 
~160 LOC). 

While porting rules, we found 4 bugs in CirC’s original field-blaster 
(see Appendix G), including a severe soundness bug. Given a ZKP compiled with 
CirC, the bug allowed a prover to incorrectly compare bit-vectors. The prover, 
for example, could claim that the unsigned value of 0010 is greater than or less 
than that of 0001. A patch to fix all 4 bugs (in the original field blaster) has 
been upstreamed, and we are in the process of upstreaming our new field blaster 
implementation into CirC. 


6.1 Verification Evaluation 


Our implementation constructs the VCs from Sect. 5.2 and emits them as SMT- 
LIB (extended with a theory of finite fields [47]). We verify them with cvc5, 
because it can solve formulas over bit-vectors and prime fields [47]. The verifica- 
tion is partial in that it is bounded in two ways. We set b € N to be the maximum 
bit-width of any bit-vector and a € N to be the maximum number of arguments 
to any n-ary operator. In our evaluation, we used a = 4 and b = 4. These bounds 
are small, but they were sufficient to find the bugs mentioned above. 


8 The different soundness conditions for instance and witness variables play a key role 
in the proof of Theorem 3. Essentially: since the condition for instances replaces 
variables with their definitions, the validity of the encodings of instance variables 
need not be explicitly enforced in A. This is why some constraints could be omitted 
in our field-blasting example.(See footnote 5). 

° Our implementation is in Rust, as is CirC. 
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Optimizing Completeness VCs. Generally, cvc5 verifies soundness VCs more 
quickly than completeness VCs. This is surprising at first glance. To see why, 
consider the soundness (S) and completeness (C) conditions for a conversion 
rule from e to e’ that generates assertions A and definitions F: 


S = (AA valid(e,t)) — valid(e’,t) C (valid(e,t) — (AA valid(e’, t)))[F] 


In both, t is a variable, e contains variables, and there are variables in e’ and 
A that are defined by F. In C, though, some variables are replaced by their 
definitions in F—which makes the number of variables (and thus the search 
space)—seem smaller for C than S. Yet, cvc5 is slower on C. 

The problem is that, while the field operations in A are standard (e.g., +, x, 
and =), the definitions in F use a CirC-IR operator that (once embedded into 
SMT-LIB) is hard for cvc5 to reason about. That operator, (ff2bv b), takes a 
prime field element x and returns a bit-vector v. If x’s integer representative is 
less than 2°, then v’s unsigned value is equal to x; otherwise, v is zero. 

The ff2bv operator is trivial to evaluate but hard to embed. cvc5’s SMT- 
LIB extension for prime fields only supports +, x and =, so no operator can 
directly relate x to v. Instead, we encode the relationship through b Booleans 
that represent the bits of v. To test whether x < 2°, we use the polynomial 
f(x) = A. ea which is zero only on [0,2?— 1]. The bit-splitting essentially 
forces cvc5 to guess v’s value; further, f’s high degree slows down the Grébner 
basis computations that form the foundation of cvc5’s field solver. 

To optimize verification of the completeness VCs, we reason about 
CirC-IR directly. First, we use the uniqueness of valid encodings and the 
fromTerm function. Since the VC assumes valid(e,t), we know e is equal to 
fromTerm(t, kind(e)). We use this equality to eliminate e from the completeness 
VC, leaving: 


(A A valid(e’,t))[F][e > from Term(t, kind(e))| 


Since F defines all variables in A and e’, the only variable after substitution 
is t. So, when t is a Boolean or small bit-vector, an exhaustive search is very 
effective;!° we implemented such a solver in 56 LOC, using CirC’s IR as a library. 

For soundness VCs, this approach is less effective. The fromTerm substitution 
still applies, but if F introduces fresh field variables, they are not eliminated and 


thus, the final formula contains field variables, so exhaustion is infeasible. 


Verification Results. We ran our VC verification on machines with Intel Xeon E5- 
2637 v4 CPUs.!! Each attempt is limited to one physical core, 83GB memory, and 
30 min. Figure 6 shows the number of VCs verified by cvc5 and our exhaustive 
solver. As expected, the exhaustive solver is effective on completeness VCs for 
Boolean and bit-vector rules, but ineffective on soundness VCs for rules that 
introduce fresh field variables. There are four VCs that neither solver verifies 


10 So long as the exhaustive solver reasons directly about all CirC-IR operators. 
11 We omit the completeness VCs for ££2bv. See Appendix C. 
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Type Prop. VCs Verified Unver. 
cvc5 exhaust either 
const — 6 6 5 6 0 
conv C 8 8 8 8 0 
conv S 8 8 4 8 0 
a i y : y i Metric Unverified Verified 
op C 259 247 247 259 0 Time (s) 27.27 25.05 
op S 263 259 126 259 4 Mem. (GB) 6.56 6.42 
uniq — 40 40 0 40 0 Constraints 559445 559445 
var C 12 12 10 12 O 
var S 6 6 0 6 0 


Fig. 7. The performance of CirC 
with the verified and unverified 
Fig. 6. VCs verified by different solvers. ‘uniq’ field-blaster. Metrics are summed 
denotes the VCs of Table 3; others are from Table 4. over the 61 functions in the Z# 
‘C’ denotes completeness; ‘S’: soundness. standard library. 


within 30min: bvadd with (b = 4, a = 4), and bvmul with (b = 3, a = 4) and 
(b = 4, a > 3). Most other VCs verify instantly. In Appendix E, we analyze how 
VC verification time depends on a and b. 


6.2 Performance and Output Quality Evaluation 


We compare CirC with our field-baster (“Verified”) against CirC with its original 
field-blaster (“Unverified”)!* on three metrics: compiler runtime, memory usage, 
and the final R1CS constraint count. Our benchmark set is the standard library 
for CirC’s Z# input language (which extends ZoKrates [16,68] v0.6.2). Our 
testbed runs Linux with 32GB memory and an AMD Ryzen 2700. 

There is no difference in constraints, but the verified field-blaster slightly 
improves compiler performance: -8% time and -2% memory (Fig.7). We think 
that the small improvement is unrelated to the fact that the new field blaster is 
verified. In Appendix E, we discuss compiler performance further. 


7 Discussion 


In this work, we present the first automatically verifiable field-blaster. We view 
the field-blaster as a set of rules; if some (automatically verifiable) conditions 
hold for each rule, then the field-blaster is correct. We implemented a performant 
and partially verified field-blaster for CirC, finding 4 bugs along the way. 

Our approach has limitations. First, we require the field-blaster to be written 
as a set of encoding rules. Second, we only verify our rules for bit-vectors of 
bounded size and operators of bounded arity. Third, we assume that each rule 
is a pure function: for example, it doesn’t return different results depending on 


12 After fixing the bugs we found. See Sect. 6. 
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the time. Future work might avoid the last two limitations through bit-width- 
independent reasoning [42,43,67] and a DSL (and compiler) for encoding rules. 
It would also be interesting to extend our approach to: a ZKP with a non- 
prime field [7,13], a compiler IR with partial or non-deterministic semantics, or 
a compiler with correctness that depends on computational assumptions. 
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A Zero-Knowledge Proofs and Compilers 


This appendix is available in the full version of the paper [49]. 


B Compiler Correctness Proofs 


This appendix is available in the full version of the paper [49]. 


C CirC-IR 


This appendix is available in the full version of the paper [49]. 


D Optimizations to the CirC Field-Blaster 


This appendix is available in the full version of the paper [49]. 


E Verified Field-Blaster Performance Details 


This appendix is available in the full version of the paper [49]. 


F Verifier Performance Details 


This appendix is available in the full version of the paper [49]. 


G Bugs Found in the CirC Field Blaster 


This appendix is available in the full version of the paper [49]. 
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Abstract. The efficiency and the security of smart contracts are their 
two fundamental properties, but might come at odds: the use of optimiz- 
ers to enhance efficiency may introduce bugs and compromise security. 
Our focus is on EVM (Ethereum Virtual Machine) block-optimizations, 
which enhance the efficiency of jump-free blocks of opcodes by eliminat- 
ing, reordering and even changing the original opcodes. We reconcile effi- 
ciency and security by providing the verification technology to formally 
prove the correctness of EVM block-optimizations on smart contracts using 
the Coq proof assistant. This amounts to the challenging problem of 
proving semantic equivalence of two blocks of EVM instructions, which is 
realized by means of three novel Coq components: a symbolic execution 
engine which can execute an EVM block and produce a symbolic state; a 
number of simplification lemmas which transform a symbolic state into 
an equivalent one; and a checker of symbolic states to compare the sym- 
bolic states produced for the two EVM blocks under comparison. 
Artifact: https: //doi.org/10.5281/zenodo.7863483 


Keywords: Coq - Ethereum Virtual Machine - Smart Contracts - 
Optimization - Theorem Proving 


1 Introduction 


In many contexts, security requirements are critical and formal verification today 
plays an essential role to verify/certify these requirements. One of such contexts 
is the blockchain, in which software bugs on smart contracts have already caused 
several high profile attacks (e.g., [14-17,30,37]). There is hence huge interest and 
investment in guaranteeing their correctness, e.g., Certora [1], Veridise [2], apri- 
orit [3], Consensys [4], Dedaub [5] are companies that offer smart contract audits 
using formal methods’ technology. In this context, efficiency is of high relevance 
as well, as deploying and executing smart contracts has a cost (in the corre- 
sponding cryptocurrency). Hence, optimization tools for smart contracts have 
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emerged in the last few years (e.g., ebso [29], SYRUP [12], GASOL [11], the solc 
optimizer [9]). Unfortunately, there is a dichotomy of efficiency and correctness: 
as optimizers can be rather complex tools (not formally verified), they might 
introduce bugs and potential users might be reluctant of optimizing their code. 
This has a number of disruptive consequences: owners will pay more to deploy 
(non-optimized) smart contracts; clients will pay more to run transactions every 
time they are executed; the blockchain will accept less transactions as they are 
more costly. Rather than accepting such a dichotomy, our work tries to over- 
turn it by developing a fully automated formal verification tool for proving the 
correctness of the optimized code. 

The general problem addressed by the paper is formally verifying semantic 
equivalence of two bytecode programs, an initial code I and an optimization of it 
O —what is considered a great challenge in formal verification. For our purpose, 
we will narrow down the problem by (1) considering fragments of code that 
are jump-free (i.e., they do not have loops nor branching), and by (2) consider- 
ing only stack EVM operations (memory/storage opcodes and other blockchain- 
specific opcodes are not considered). These assumptions are realistic as working 
on jump-free blocks still allows proving correctness for optimizers that work at 
the level of the blocks of the CFG (e.g., super-optimizers [11,12,29] and many 
rule-based optimizations performed by the Solidity compiler [9]). Considering 
only stack optimizations, and leaving out memory and storage simplifications 
and blockchain-specific bytecodes, does not restrict the considered programs, as 
we work at the smaller block partitions induced by the not handled operations 
found in the block (splitting into the block before and after). Even in our nar- 
rowed setting, the problem is challenging as block-optimizations can include any 
elimination, reorder and even change of the original bytecodes. 

Consider the next block I, taken from a real smart contract [8]. The GASOL 
optimizer [11], relying on the commutativity of OR and AND, optimizes it to 0: 


I: PUSH2 0x100 PUSH1 0x1 PUSH1 Oxa8 SHL SUB NOT SWAP1 SWAP2 AND PUSH1 Ox8 SWAP2 SWAP1 
SWAP2 SHL PUSH2 0x100 PUSH1 0x1 PUSH1 Oxa8 SHL SUB AND OR PUSH1 0x5 

0: PUSH2 0x100 PUSH1 0x1 PUSH1 Oxa8 SHL SUB DUP1 NOT SWAP2 PUSH1 0x8 SHL AND 
SWAP2 AND OR PUSH1 0x5 


This saves 11 bytes because (1) the expression SUB(SHL(168,1) ,256) -that cor- 
responds to “PUSH2 0x100 PUSH1 0x1 PUSH1 Oxa8 SHL SUB” — is computed twice; 
but it can be duplicated if the stack operations are properly made saving 8 bytes; 
and (2) two SWAPs are needed instead of 5, saving 3 more bytes. 

This paper proposes a technique, and a corresponding tool, to automatically 
verify the correctness of EVM block-optimizations (as those above) on smart con- 
tracts using the Coq proof assistant. This amounts to the challenging problem of 
proving semantic equivalence of two blocks of EVM instructions, which is realized 
by means of three main components which constitute our main contributions (all 
formalized and proven correct in Coq): (1) a symbolic interpreter in Coq to sym- 
bolically execute the EVM blocks I and 0 and produce resulting symbolic states 
Sı and So, (2) a series of simplification rules, which transform S$; and So into 
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equivalent ones S and 84, (3) a checker of symbolic states in Coq to decide if 
two symbolic states S{ and Sj are semantically equivalent. 


2 Background 


The Ethereum VM (EVM) [38] is a stack-based VM with a word size of 256-bits 
that is used to run the smart contracts on the Ethereum blockchain. The EVM 
has the following categories of bytecodes: (1) Stack operations; (2) Arithmetic 
operations; (3) Comparison and bitwise logic operations; (4) Memory and stor- 
age manipulation; (5) Control flow operations; (6) Blockchain-specific opcodes, 
e.g., block and transaction environment information, compute hash, calls, etc. 
The first three types of opcodes are handled within our verifier, and handling 
optimizations on opcodes of types 4-6 is discussed in Sect. 6. 

The focus of our work is on optimizers that perform optimizations only at 
the level of the blocks of the CFG (i.e., intra-block optimizations). A well-known 
example is the technique called super-optimization [26] which, given a loop-free 
sequence of instructions searches for the optimal sequence of instructions that is 
semantically equivalent to the original one and has optimal cost (for the consid- 
ered criteria). This technique dates back to 1987 and has had a revival [25,31] 
thanks to the availability of SMT solvers that are able to do the search efficiently. 
We distinguish two types of possible intra-block optimizations: (i) Rule-based 
optimizations which consist in applying arithmetic/bitwise simplifications like 
ADD(X,0)=X or NOT(NOT(X))=X (see a complete list of these rules in App. A 
in [10]); and (ii) Stack-data optimizations which consist in searching for alter- 
native stack operations that lead to an output stack with exactly the same data. 


Example 1 (Intra-block optimizations). The rule-based optimization (i) X+0 —> 
X simplifies the block “PUSH1 0x5, PUSH1 0x0, ADD” to “PUSH1 0x5”. On the 
other hand, stack-data optimizations (ii) can optimize to “ADD, DUP1” the block 
“DUP2, DUP2, ADD, SWAP2, ADD”, as duplicating the operands and repeating 
the ADD operation is the same as duplicating the result. Unlike rule-based opti- 
mization, stack-data optimizations cannot be expressed as simple patterns that 
can be easily recognized. 


The first type of optimizations are applied by the optimizer integrated in 
the Solidity compiler [9] as rule transformations, and they are also applied by 
EVM optimizers in different ways. ebso [29] encodes the semantics of arithmetic 
and bitwise operations in the SMT encoding so that the SMT solver searches 
for these optimizations together with those of type (ii). Instead, SYRUP [12] and 
GASOL [11] apply rule-based optimizations in a pre-phase and leave to the SMT 
solver only the search for the second type of optimizations. This classification of 
optimizations is also relevant for our approach as (i) will require integrating and 
proving all simplification rules correct (Sect. 4.2) while (ii) are implicit within 
the symbolic execution (Sect. 4.1). A block of EVM code that has been subject 
to optimizations of the two types above is in principle “provable” using our tool. 
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There is not much work yet on formalizing the EVM semantics in Coq. One of 
the most developed approaches is [22], which is a definition of the EVM semantics 
in the Lem [28] language that can be exported to interactive theorem provers like 
Isabelle/HOL or Coq. According to the comparison in [21], this implementation 
of EVM “is executable and passes all of the VM tests except for those dealing 
with more complicated intercontract execution”. However, we have decided not 
to use it for our checker due to three reasons: (a) the generated Coq code from 
Lem definitions is not “necessarily idiomatic” and thus it would generate a very 
complex EVM formalization in Coq that would make theorems harder to state 
and prove; (b) the author of the Lem definition states that “the Coq version of 
the EVM definition is highly experimental’; and (c) it is not kept up-to-date. 

The other most developed implementation of the EVM semantics in Coq that 
we have found is [23]. It supports all the basic EVM bytecodes we consider in our 
checker, and looked promising as our departing point. The implementation uses 
Bedrock Bit Vectors (bbv) |T] for representing the EVM 256-bit values, as we use as 
well. It is not a full formalization of the EVM because it does not support calling or 
creation of smart contracts, but provides a function that simulates consequent 
application of opcodes to the given execution state, call info and Ethereum 
state mocks. The latter two pieces of information would add complexity and 
are not needed for our purpose. Therefore, we decided to develop our own EVM 
formalization in Coq (presented in Sect. 3) which builds upon some ideas of [23], 
but introduces only the minimal elements we need to handle the instructions 
supported by the checker. This way the proofs will be simpler and conciser. 


3 EVM Semantics in Coq 


Our EVM formalization is a concrete interpreter that executes a block of EVM 
instructions. For representing EVM words we use EVMWord that stands for the type 
“word 256” of the bbv library [7]. For representing instructions we use: 


Inductive stack_op_instr := Inductive instr := 


| PUSH (size: nat) (w: EVMWord) 

| POP 

| DUP (pos: nat) 

| SWAP (pos: nat) 

| StackInstr (label: stack_op_instr). 


Type stack_oper_instr defines instructions that operate only on the stack, i.e., 
each pops a fixed number of elements and pushes a single value back (see 
App. B in [10] for the full list). Type instr encapsulates this category together 
with the stack manipulation instruction (PUSH, etc.). The type block stands for 
“list instr”. 

To keep the framework general, and simplify the proofs, the actual imple- 
mentation of instructions from stack_op_instr are provided to the interpreter as 
input. For this, we use a map that associates instructions to implementations: 


Inductive stack_operation := 
| StackOp (comm: bool) (n : nat) (f : list EVMWord — option EVMWord). 
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Definition stack_op_map := map stack_oper_instr stack_operation. 


The type stack_operation defines an implementation for a given operation: comm 
indicates if the operation is commutative; n is the number of stack elements to 
be removed and passed to the operation; and f is the actual implementation. 
The type stack_op_map maps keys of type stack_oper_instr to values of type 
stack_operation. Suppose evm_add and evm_mul are implementations of ADD and 
MUL (see App. C in [10]), the actual stack operations map is constructed as: 


Definition evm_stack_opm : stack_op_map := 


ADD |i StackOp true 2 evm_add; MUL |i StackOp true 2 evm_mul, ... 


In addition, we require the operations in the map to be valid with respect to 
the properties that they claim to satisfy (e.g., commutativity), and that when 
applied to the right number of arguments they should succeed (i.e., do not return 
None). We refer to this property as valid_stack_op_map. 

An execution state (or simply state) includes only a stack (currently we 
support only stack operations) which is as a list of EVMWord, and the interpreter 
is a function that takes a block, an initial state, and a stack operations map, 
and iteratively executes each of the block’s instructions: 


Definition stack := list EVMWord. 
Inductive state := 


| ExState (stk: stack). 
Fixpoint concr_int (p: block) (st: state) (ops: stack_op_map): option state := ... 


The result can be either Some st or None in case of an error which are caused 
only due to stack overflow. In particular, we are currently not taking into account 
the amount of gas needed to execute the block. Our implementation follows the 
EVM semantics [38], considering the simplicity of the supported operations, the 
concrete interpreter is a minimal trusted computing base. In the future, we plan 
to test it using the EVM test suite. 


4 Formal Verification of EVM-Optimizations in Coq 


Two jump-free blocks p1 and p2 are equivalent wrt. to an initial stack size k, if 
for any initial stack of size k, the executions of p1 and p2 succeed and lead to 
the same state. Formally: 


Definition sem_eq_blocks: (p1 p2: block) (k: nat) (ops: stack_op_map) : Prop := 
V (in_st: state) (in_stk: stack), 


get_stack in_st = in_stk — length in_stk = k — 
J (out_st : state), concr_int p1 in_st ops = Some out_st ^ 
concr_int p2 in_st ops = Some out_st 


Note that when concr_int returns None for both p1 and p2, they are not consid- 
ered equivalent because in the general case they can fail due to different reasons. 
Note also that EVM operations are deterministic, so if concr_int evaluates to 
a sucessful final state out_st it will be unique. 

An EVM block equivalence checker is a function that takes two blocks, the size 
of the initial stack, and returns true/false. Providing the size k of the initial 
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stack is not a limitation of the checker, as this information is statically known 
in advance. Note that the maximum stack size in EVM is bounded by 1024, and 
that if the execution (of one or both blocks) wrt. to this concrete initial stack 
size leads to under/over stack overflow they cannot be reported equivalent. The 
soundness of the equivalence checker is stated as follows: 


Definition eq_block_chkr_snd (chkr : block block nat bool) : Prop := 


V (p1 p2: block) (k: nat), 
chkr p1 p2 k = true — sem_equiv_blocks pi p2 k evm_stack_opm 


Given two blocks pı and p2, checking their equivalence (in Coq) has the fol- 
lowing components: (i) Symbolic Execution (Sect. 4.1): it is based on an inter- 
preter that symbolically executes a block, wrt. an initial symbolic stack of size k, 
and generates a final symbolic stack. It is applied on both pı and pə to generate 
their corresponding symbolic output states S; and S2. (ii) Rule optimizations 
(Sect. 4.2): it is based on simplification rules that are often applied by program 
optimizers, which rewrite symbolic states to equivalent “simpler” ones. This step 
simplifies Sı and S2 to S| and $b. (iii) Equivalence Checker (Sect. 4.3): it receives 
the simplified symbolic states, and determines if they are equivalent for any con- 
crete instantiation of the symbolic input stack. It takes into account, for example, 
the fact that some stack operations are commutative. 


4.1 EVM Symbolic Execution in Coq 


Symbolic execution takes an initial symbolic state (i.e., stack) [s9,..., sx], a 
block, and a map of stack operations, and generates a final symbolic state (i.e., 
stack) with symbolic expressions, e.g., [5+s0, $1, 52], representing the correspond- 
ing computations. In order to incorporate rule-based optimizations in a simple 
and efficient way, we want to avoid compound expressions such as 5 + (so * $1), 
and instead use temporal fresh variables together with a corresponding map that 
assigns them to simpler expressions. E.g, the stack [5 + (so * $1), 52] would be 
represented as a tuple ([e1, s2], {e1 + 5+ e0, €o + so * $1}) where e; are fresh 
variables. To achieve this, we define the symbolic stack as a list of elements that 
can be numeric constant values, initial stack variables or fresh variables: 


Inductive sstack_val : Type := 


| Val (val: EVMWord) | InStackVar (var: nat) | FreshVar (var: nat). 
Definition sstack := list sstack_val. 


and the map that assigns meaning to fresh variables is a list that maps each 
fresh variable to a sstack_val, or to a compound expression: 


Inductive smap_val : Type := 
| SymBasicVal (val: sstack_val) 


| SymOp (opcode : stack_op_instr) (args : list sstack_val). 
Definition smap := list (nat*xsmap_val). 


Finally, a symbolic state is defined as a SymState term where k is the size of 
the initial stack, maxid is the maximum id used for fresh variables (kept for 
efficiency), sstk is a symbolic stack, and m is the map of fresh variables. 
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Inductive sstate : Type := | SymState (k maxid: nat) (sstk: sstack) (m: smap). 


Example 2 (Symbolic execution). Given pı =“PUSH1 0x5 SWAP2 MUL ADD” and 
p2 = “PUSH1 0x0 ADD MUL PUSH1 0x5 ADD”, symbolically executing them with 
k=3 we obtain the symbolic states represented by sst1 = ([e}, s2], {e1 = eo + 
5,€p > 81 * So}) and sst2 = ([e2, s2], {e2 => 5 + €1, €1 | €o * 81, €o 0+ So}). 


Note that we impose some requirements on symbolic states to be valid. E.g., 
for any element i+ v of the fresh variables map, all fresh variables that appear 
in v have smaller indices than 7. We refer to these requirements as valid_sstate. 

Given a symbolic (final) state and a concrete initial state, we can convert 
the symbolic state into a concrete one by replacing each s; by its corresponding 
value, and evaluating the corresponding expressions (following their definition in 
the stack operations map). We have a function to perform this evaluation that 
takes the stack operations map as input: 


Definition eval_sstate (in_st: state) (sst: sstate) (ops : stack_op_map) 


: option state :—... 


Our symbolic execution engine is a function that takes the size of the initial 
stack, a block, a map of stack operations, and generates a symbolic final state: 


Definition sym_exec (p: block) (k: nat) (ops: stack_op_map) : option sstate := ... 


Note that we do not pass an initial symbolic state, but rather we construct it 
inside using k. Also, the result can be None in case of failure (the causes are the 
same as those of conc_interpreter). 

Soundness of sym_exec means that whenever it generates a symbolic state as 
a result, then the concrete execution from any stack of size k will succeed and 
produce a final state that agrees with the generated symbolic state: 


Theorem sym_exec_snd: 
V (p: block) (k: nat) (ops: stack_op_map) (sst: sstate), 
valid_stack_op_map ops — 
sym_exec p k ops = Some sst — 
valid_sstate sst ^ 
V (in_st : state) (in_stk : stack), 


get_stack in_st = in_stk — 

length in_stk = k — 

J (out_st : state), 
concr_int p in_st ops = Some out_st ^ 
eval_sstate in_st sst ops = Some out_st 


4.2 Simplification Rules 


To capture equivalence of programs that have been optimized according to “rule 
simplifications” (type (i) in Sect. 2) we need to include the same type of simpli- 
fications (see App. A in [10]) in our framework. Without this, we will capture 
EVM-blocks equivalence only for “data-stack equivalence optimizations” (type (ii) 
in Sect. 2). 
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An optimization function takes as input a symbolic state, and tries to simplify 
it to an equivalent state. E.g, if a symbolic state includes e; ++ s3 +0, we can 
replace it by e; + s3. The following is the type used for optimization functions: 


Definition optim := sstate — sstatexbool. 


Optimization functions never fail, i.e., in the worst case they return the same 
symbolic state. This is why the returned value includes a Boolean to indicate if 
any optimization has been applied, which is useful when composing optimizations 
later. The soundness of an optimization function can be stated as follows: 


Definition optim_snd (opt: optim) : Prop := 
forall (sst: sstate) (sst’: sstate) (b: bool), 
valid_sstate sst — opt sst = (sst’, b) > 


(valid_sstate sst’ ^ 
forall (st st’: state), eval_sstate st sst evm_stack_opm = Some st’ — 
eval_sstate st sst’ evm_stack_opm = Some st’). 


We have implemented and proven correct the most-used simplification rules 
(see App. A in [10]). E.g., there is an optimization function optimize_add_zero 
that rewrites expressions of the form E + 0 or 0+ E to E, and its soundness 
theorem is: 


Theorem optimize_add_zero_snd: optim_snd optimize_add_zero. 


Example 3. Consider again the blocks of Example 2. Using optimize_add_zero 
we can rewrite sst2 to sst?’ = ([e2, s2], {eg > 5 + €1,€1 | €o * 51, €0 |} So}), 
by replacing eo + 0+ so by eg |> so. 


Note that the checker can be easily extended with new optimization functions, 
simply by providing a corresponding implementation and a soundness proof. 
Optimization functions can be combined to define simplification strategies, which 
are also functions of type optim. E.g., assuming that we have basic optimization 
functions f1,...,fn: (1) Apply fi,...,f£n iteratively such that in iteration i function 
f; is applied as many times as it can be applied. (2) Apply each f; once in some 
order and repeat the process as many times as it can be applied. (3) Use the 
simplifications that were used by the optimizer (it needs to pass these hints). 


4.3 Stacks Equivalence Modulo Commutativity 


We say that two symbolic stacks sst1 and sst2 are equivalent if for every possible 
initial concrete state st they evaluate to the same state. Formally: 


Definition eq_sstate (ssti sst2: sstate) (ops : stack_op_map) : Prop := 


Vv (st: state), eval_sstate st sst1 ops = eval_sstate st sst2 ops. 


However, this notion of semantic equivalence is not computable in general, and 
thus we provide an effective procedure to determine such equivalence by checking 
that at every position of the stack both contain “similar” expressions: 


Definition eq_sstate_chkr (ssti sst2: sstate) (ops : stack_op_map) : bool := ... 
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To determine if two stack elements are similar, we follow their definition in the 
map if needed until we obtain a value that is not a fresh variable, and then either 
(1) both are equal constant values; (2) both are equal initial stack variables; or 
(3) both correspond to the same instruction and their arguments are (recur- 
sively) equivalent (taking into account the commutativity of operations). E.g., 
the stack elements (viewed as terms) DIV(MUL(s9,ADD(s1,52)),0x16) and 
DIV(MUL(ADD(s2,51),50),0x16) are considered equivalent because the oper- 
ations ADD and MUL are commutative. 


Example 4. eq_sstate_chkr fails to prove equivalence of sst1 and sst2 of Exam- 
ple 2, because, when comparing ez and e}, it will eventually check if 0+ so and so 
are equivalent. It fails because the comparison is rather “syntactic”. However, it 
succeeds when comparing sst1 and sst2’ (Example 3), which is a simplification 
of sst2. 


This procedure is an approximation of the semantic equivalence, and it can 
produce false negatives if two symbolic states are equivalent but are expressed 
with different syntactic constructions. However, it is sound: 


Theorem eq_sstate_chkr_snd: 
YV (sst1 sst2: sstate) (ops : stack_op_map), 


valid_stack_op_map ops — valid_sstate sst1 — valid_sstate sst2 — 
eq_sstate_chkr sst1 sst2 ops = true — eq_sstate sst1 sst2 ops. 


Note that we require the stack operations map to be valid in order to guaran- 
tee that the operations declared commutative in ops are indeed commutative. In 
order to reduce the number of false negatives, the simplification rules presented 
in Sect. 4.2 are very important to rewrite symbolic states into closer syntactic 
shapes that can be detected by eq_sstate_chkr. 

Finally, given all the pieces developed above, we can now define the block 
equivalence checker as follows: 


Definition evm_eq_block_chkr (opt: optim) (p1 p2: block) (k: nat) : bool := 
match sym_exec p1 k evm_stack_opm with 
| None > false 
| Some sst1 > 
match sym_exec p2 k evm_stack_opm with 
| None => false 


| Some sst2 > let (sst2’, _) := opt sst1 in 
let (sst1’, _) := opt sst2 in 
eq_sstate_chkr sst1’ sst2’ evm_stack_opm 


It symbolically executes p1 and p2, simplifies the resulting symbolic states by 
applying optimization opt, and finally calls eq_sstate_chkr to check if the states 
are equivalent. Note that it is important to apply the optimization rules to both 
blocks, as the checker might apply optimization rules that were not applied by the 
external optimizer. This would lead to equivalent symbolic states with different 
shapes that will not be detected by the symbolic state equivalence checker. 


Formally Verified EVM Block-Optimizations 185 


Table 1. Summary of experiments using GASOL. 


| | #blocks|| CHKR || CHKR* | | | #blocks || CHKR || CHKR* | 
| | SIMP | | Yes Time | Yes Time | | | SIMP | | Yes Time | Yes Time | 
a x 36624 || 36624 2.60 || 36624 11.76 a x 35754 || 35754 2.57 || 35754 12.59 
Ol v 43228 || 27149 4.69 | 43109 14.09 nal v 32192 | 31488 2.50 || 31798 12.17 


The above checker is sound when opt is sound: 


Theorem evm_eq_block_chkr_snd: 


V (opt: optim), optim_snd opt — eq_block_chkr_snd (evm_eq_block_chkr opt) 


5 Implementation and Experimental Evaluation 


The different components of the tool have been implemented in Coq v8.15.2, 
together with complete proofs of all the theoretical results (more than 180 proofs 
in ~7000 lines of Coq code). The source code, executables and benchmarks 
can be found at https://github.com/costa-group/forves/tree/stack-only and the 
artifact at https: //doi.org/10.5281/zenodo.7863483. The tool currently includes 
15 simplification rules (see App. A in [10]). We have tried our implementation 
on the outcome of two optimization tools: (1) the standalone GASOL optimizer 
and, (2) the optimizer integrated within the official Solidity compiler solc. For 
(1), we have already fully automated the communication among the optimizer 
and checker and have been able to perform a thorough experimental evaluation. 
While in (2), the communication is more difficult to automate because the CFG 
of the original program can change after optimization, i.e., it can make cross- 
block optimization. Hence, in this case, we have needed human intervention to 
disable intra-block optimizations and obtain the blocks for the comparison (we 
plan to automate this usage in the future). For evaluating (2) we have used as 
benchmarks 1,280 blocks extracted from the smart contracts in the semantic 
test suite of the solc compiler [6], succeeding to prove equivalence on 1,045 out 
of them. We have checked that the fails are due to the use of optimization rules 
not yet implemented by us. As these blocks are obtained from the test suite of 
the official solc Solidity compiler, optimized using the solc optimizer, the good 
results on this set suggest the validity can be generalized to other optimizers. Now 
we describe in detail the experimental evaluation on (1) for which we have used as 
benchmarks 147,798 blocks belonging to 96 smart contracts (see App. D in [10]). 

GASOL allows enabling/disabling the application of simplification rules and 
choosing an optimization criteria: GAS consumption or bytes SIZE (of the 
code) [11]; combining these parameters we obtain 4 different sets of pairs-of- 
blocks to be verified by our tool. From these blocks, we consider only those that 
were actually optimized by GASOL, i.e., the optimized version is syntactically 
different from the original one. In all the cases, the average size of blocks is 8 
instructions. Table 1 summarizes our results, where each row corresponds to one 
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setting out of the 4 mentioned above: Column 1 includes the optimization crite- 
ria; Column 2 indicates if rule simplifications were applied by GASOL; Column 8 
indicates how many pairs-of-blocks were checked; Columns 4-7 report the results 
of applying 2 versions of the checker, namely CHKR corresponds to the checker 
that only compares symbolic states and CHKR* corresponds to the checker that 
also applies all the implemented rule optimizations iteratively as much as they 
can be applied (see Sect.4.2). For each we report the number of instances it 
proved equivalent and the total runtime in seconds. The experiments have been 
performed on a machine with an Intel i7-4790 at 3.60 GHz and 16GB of RAM. 
For sets in which GASOL does not apply simplification rules (marked with x), 
both CHKR and CHKR’ succeed to prove equivalence of all blocks. When sim- 
plifications are applied (marked with v), CHKR* succeeds in 99% of the blocks 
while CHKR ranges from 63% for GAS to 99% for SIZE. This difference is due 
to the fact that GASOL requires the application of rules to optimize more blocks 
wrt. GAS (~ 37% of the total) than wrt. SIZE (~ 1%). Moreover, all the blocks 
that CHKR® cannot prove equivalent have been optimized by GASOL using rules 
which are not currently implemented in the checker, so we predict a success rate 
of 100% when all the rules in App. A in [10] are integrated. Regarding time, 
CHKR* is 3-5 times slower than CHKR because of the overhead of applying 
rule optimizations, but it is still very efficient (all 147.798 instances are checked 
in 50.61 seconds). As a final comment, thanks to the checker we found a bug in 
the parsing component of GASOL, that treated the SGT bytecode as GT. The 
bug was directly reported to the GASOL developers and is already fixed [19]. 


6 Conclusions, Related and Future Work 


Our work provides the first tool able to formally verify the equivalence of jump- 
free EVM blocks and has required the development of all components within the 
verification framework. The implementation is not tied to any specific tool and 
could be easily integrated within any optimization tool. Ongoing work focuses 
on handling memory and storage optimizations. This extension needs to sup- 
port the execution of memory/storage operations at the level of the concrete 
interpreter, and design an efficient data structure to represent symbolic mem- 
ory/storage. Full handling of blockchain-specific opcodes is straightforward, it 
only requires adding the corresponding implementations to the stack operations 
map evm_stack_opm. A more ambitious direction for future work is to handle 
cross-block optimizations. 

There are two approaches to verify program optimizations, (1) verify the cor- 
rectness of the optimizations and develop a verified tool, e.g., this is the case of 
optimizations within the CompCert certified compiler [24] and a good number of 
optimizations that have been formally verified in Coq [13, 18,27,32,33], (2) or use 
a translation validation approach [20,34—36] in which rather than verifying the 
tool, each of the compiled/optimized programs are formally checked to be cor- 
rect using a verified checker.We argue that translation validation [34] is the most 
appropriate approach for verifying EVM optimizations because: (i) EVM compilers 


Formally Verified EVM Block-Optimizations 187 


(together with their built-in optimizers) are continuously evolving to adjust to 
modifications in the rather new blockchain programming languages, (ii) existing 
EVM optimizers use external components such as SMT solvers to search for the 
optimized code and verifying an SMT solver would require a daunting effort, (iii) 
we aim at generality of our tool rather than restricting ourselves to a specific 
optimizer and, as already explained, the design of our checker has been done 
having generality and extensibility in mind, so that new optimizations can be 
easily incorporated. Finally, it is worth mentioning the KEVM framework [21], 
which in principle could be the basis for verifying optimizations as well. However, 
we have chosen to develop it in Coq due to its maturity. 
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Abstract. Logic locking was designed to be a formidable barrier to 
IP piracy: given a logic design, logic locking modifies the logic design 
such that the circuit operates correctly only if operated with the “cor- 
rect” secret key. However, strong attacks (like SAT-based attacks) soon 
exposed the weakness of this defense. Stripped functionality logic locking 
(SFLL) was recently proposed as a strong variant of logic locking. SFLL 
was designed to be resilient against SAT attacks, which was the bane 
of conventional logic locking techniques. However, all SFLL-protected 
designs share certain “circuit patterns” that expose them to new attacks 
that employ structural analysis of the locked circuits. 

In this work, we propose a new methodology—Structurally Robust 
SFLL (SR-SFLL)—that uses the power of modern satisfiability and syn- 
thesis engines to produce semantically equivalent circuits that are resilient 
against such structural attacks. On our benchmarks, SR-SFLL was able 
to defend all circuit instances against both structural and SAT attacks, 
while all of them were broken when defended using SFLL. Further, we 
show that designing such defenses is challenging: we design a variant of 
our proposal, SR-SFLL(0), that is also robust against existing struc- 
tural attacks but succumbs to a new attack, SYNTAK (also proposed in 
this work). SYNTAK uses synthesis technology to compile SR-SFLL(0) 
locked circuits into semantically equivalent variants that have structural 
vulnerabilities. SR-SFLL, however, remains resilient to SYNTAK. 


Keywords: Logic Locking - SFLL - Program Synthesis 


1 Introduction 


Semiconductor design houses often outsource the fabrication of the integrated 
circuits (IC) to third-party foundries [17]. This allows effective use of the fab- 
rication equipment and facilities at the foundry, while the design houses can 
concentrate solely on the design. Though this separation of concerns provides 
attractive cost benefits, it also opens up certain threats: malicious agents at a 
foundry may now fabricate illegal copies of the ICs that can be sold in the gray 
market leading to serious loss in revenue for the design house. 

Logic locking was proposed as an effective mechanism to combat such intel- 
lectual property (IP) threats. Logic locking modifies the original IC in a manner 
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Fig. 1. SFLL-HD locked circuit (Y); C is the (unprotected) original circuit. (Color 
figure online) 


that the circuit operates correctly only after it is activated with a secret key. 
This secret key is loaded into tamperproof memory by the design house post- 
fabrication. However, soon powerful attacks, especially those involving SAT 
solver [23,26,34], were invented to thwart this defense. Since then, more power- 
ful defenses were proposed that were resistant to such SAT attacks. One such 
SAT-resilient attack that has gained a lot of popularity is Stripped Functionality 
Logic Locking (SFLL) [44]. 

SFLL operates by using the secret key to identify a set of inputs as protected 
patterns—the circuit is forced to produce incorrect results if the input matches 
any of these protected patterns. The cube stripping circuit (see Fig. 1) is respon- 
sible for matching the inputs to the protected patterns. An additional restore 
circuit is used to restore the correct functionality for the protected patterns. The 
circuit does not operate correctly with an incorrect key as the restore circuit, 
then, identifies a different set of patterns to be restored. Though quite potent 
against SAT attacks, attackers soon identified certain unique structural patterns 
in the design of SFLL that could be leveraged to build attacks via structural 
analysis [4,32,40]. 

In this work, we propose a scheme, Structurally Robust Stripped Functionality 
Logic Locking (SR-SFLL), to defend against such structural analysis. SR-SFLL 
uses efficient synthesis [33] machinery powered by modern SAT solvers to ensure 
that certain structural security constraints are met that ensures its resilience 
against the structural attacks. 

SR-SFLL operates as follows: (1) identify a “cut” of the original design C 
to break the design into two segments C; and C2 (see Fig. 1), and (2) introduce 
a carefully synthesized perturbation unit Q between Cı and Cə (see Fig. 2b). 
As the perturbation unit does not have any specific structural signature and 
is hidden deep within the original design, our scheme is no more vulnerable to 
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thesis tool. 


Fig. 2. Transformation of SFLL locked circuit to SR-SFLL locked circuit. 


Table 1. Attack resilience of logic locking techniques: V (resp. %) represents resilience 
(resp. vulnerability) to attacks. 


Attack Anti-SAT | SARLock | SFLL SR-SFLL(0) | SR-SFLL 
SAT [34] Vv Vv Vv Y Vv 
Removal [43] x x Y Y Vv 
AppSAT [30] x x Y Y Vv 
Structural [4, 32, 40] x x x Y Vv 
SynTAk (this paper) x x x x Vv 


attacks by structural analysis. Further, the location of the “cut” is unknown to 
the attacker and the perturbation unit misses any structural pattern, making it 
challenging to apply other attacks like removal attack [43]. 

We argue that designing such a defense scheme is non-trivial: we show 
a version (SR-SFLL(0)) of SR-SFLL that is also resistant to structural 
attacks. However, we could design a novel structural attack algorithm, SYN- 
TAK, that breaks SR-SFLL(0): in our experiments, SYNTAK breaks 71.25% of 
SR-SFLL(0) locked benchmark instances. Our attack algorithm, SYNTAK, is a 
novel attack that also uses synthesis machinery to compile an existing circuit to 
a semantically equivalent one that is amenable to structural analysis. However, 
SR-SFLL is robust against SYNTAK. 

Table 1 summarizes the resiliency of various logic locking techniques, with 
the attacks listed for the rows and the defenses in the columns. For a table 
cell (A, D), we use % to show that attack A breaks defense D (in most cases); 
the mark Y shows that defense D is robust against attack A. The attack and 
defense techniques marked with a red background are proposed in this paper. 
As SR-SFLL locked circuits remain semantically equivalent to the SFLL locked 
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circuits, SR-SFLL locked circuit provides the same security against the SAT- 
based [34], removal [43], and AppSAT [30] attacks. 

We evaluated SR-SFLL(0), SR-SFLL, and SyNTAK on 80 benchmarks 
from the ISCAS’85 and MCNC benchmark suites with different numbers of key 
inputs and cube stripping functionalities. Our experiments showed that circuits 
locked by SR-SFLL are robust to structural attacks—none of the SR-SFLL 
locked designs could be broken by existing structural attacks (like SFLLUnlock, 
FALL, and GNNUnlock), or by our SYNTAK (also proposed in this work). While 
the structural attacks failed to recover the structural patterns altogether, SYN- 
TAK could not break the SR-SFLL even over two days for circuits that were 
locked in less than an hour. 

SR-SFLL provides asymmetric advantage to the defender over the attacker 
on multiple counts: the secret key K used to lock the circuit, knowledge of the 
secret cut where FSC is partitioned, and a much harder synthesis problem (on 
attacks using SYNTAK). 

We make the following contributions to this work: 


— We propose, Structurally Robust Stripped Functionality Logic Locking 
(SR-SFLL), a new defense against IP threats. In contrast, to SFLL, 
SR-SFLL is not vulnerable to attacks via structural analysis; 

— We propose a new attack, SYNTAK, and show its potency at breaking alter- 
nate structural attack resistant designs (SR-SFLL(0)) that use similar ideas 
as SR-SFLL but are not designed carefully. This shows the non-triviality of 
designing new defenses, and in particular, SR-SFLL; 

— We evaluate SR-SFLL(0), and SR-SFLL on circuits from two benchmark 
suites against existing structural attacks as well as SYNTAK. Our experimen- 
tal results show that SR-SFLL is not vulnerable to structural analysis or 
SYNTAK and the overheads of the technique are low (about 0.18% on aver- 
age over SFLL). 


2 Background 


2.1 Stripped Functionality Logic Locking (SFLL) 


Figure 1 shows a stripped functionality logic locked circuit, y. The original circuit 
(C) takes a set of input bits, X, and produces an output bit, 01. The SFLL locked 
design, y, consumes input bits, X, and a secret key (bits) K to output y. 

The core idea of SFLL is to create a functionality stripped circuit (FSC) that 
would produce incorrect output for certain protected patterns. The cube stripping 
circuit (S) recognizes the protected patterns, and makes the signal sfo high if 
any of them is encountered; an XOR gate flips the output of the original circuit 
(01) for these protected patterns (i.e. when sfo is high). Hence, ox is the correct 
output for inputs not in the protected patterns, but the complement of it for the 
protected patterns. 

The correct functionality is re-established using the restore circuit (R). The 
restore circuit accepts (secret) key bits K along with the input X to produce 
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the signal sro; if the correct key K is supplied, sro is high if and only if the input 
is amongst the protected bits. The cube stripping circuit (S) is functionally 
equivalent to R but uses a hardcoded key value. Hence, the restore unit restores 
the correct output for the protected inputs (via an XOR of o2 and sro). Hence, 
the locked circuit now works correctly if the correct key is applied (i.e. the key 
K supplied to R matches the hardcoded key in S). 

While many possible choices exist for a function that identifies protected 
patterns of inputs based on a key, the hamming distance was found to be an 
interesting choice [44]. The corresponding variant of SFLL, known as SFLL- 
HD, identifies an input (X) as a protected pattern if it has a certain hamming 
distance (h) from the key (K). 


2.2 SFLL Attacks 


SFLL is robust to all known attacks on (conventional) logic locking [4]. How- 
ever, subsequently, many structural attacks were proposed that break SFLL. 
These attacks use one or more of the structural properties exhibited by SFLL 
implementations [4,40]: 


1. As the sfo signal is required to invert the signal from the original circuit C 
(for protected patterns), which, then, has to be reverted by the restore circuit, 
the sfo signal has to be on the boundary of FSC and the restore unit; 

2. sfo does not depend on the key inputs; 

sfo has low activity, i.e. it is 0 most of the time; 

4. S and R can be removed from the circuit to restore the functionality of the 
original circuit. 


ee 


All the following attacks assume that they know the hamming distance (h) 
used to lock the circuit. 


SFLLUnlock. SFLLUnlock [40] uses the first and the second structural proper- 
ties (see above) to identify a few signals that may be sfo (referred to as candidate 
signals). Next, for each of the candidates, the attack uses the following technique: 
it uses SAT solver to extract an input such that the candidate signal is 1; if the 
candidate signal is indeed sfo, this input must be a protected pattern (which has 
a hamming distance of h with the correct key). Then, it attempts to identify the 
correct key as follows: 


— use a sequence of bit-flips to identify the bits that are different than the 
correct key using the properties of hamming distance; 

— set up a system of equations to find the unknown key that must have a 
hamming distance of h from the inferred protected patterns. 


The inferred key is, finally, validated using a working circuit as an oracle. 
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Functional Analysis Attacks on Logic Locking (FALL). The first step of 
FALL [32] is to identify a set of candidate signals that may be the output of the 
cube stripping circuit i.e. sfo. FALL achieves this by exploiting the first and sec- 
ond vulnerabilities of SFLL. To finalize if a signal is sfo (among these candidate 
signals), FALL derives a set of lemmas that exploit the functional properties 
of hamming distance. FALL proposes three algorithms based on these lemmas 
for a specific range of hamming distance values. For example, the AnalyzeUnate- 
ness algorithm is only applicable when h = 0, Hamming2D is applicable when 
h < |K|/4, and SlidingWindow is for larger hamming distances. 


GNNUnlock. GNNUnilock [4] automates the removal of cube stripping circuit 
and restore circuit from the locked circuit to obtain the original circuit. For 
their analysis, the circuit is transformed into a graph representation where the 
nodes of the graph represent the gates, and the edges represent the wires. Each 
node in the graph is associated with a feature vector that contains information 
that describes its characteristics (in-degree, out-degree, type of gate of the node, 
whether the node is connected to key input (K), circuit input (X), or circuit 
output (Y), type of gates appearing in the neighborhood on the node, etc.). 

GNNUnlock uses graph neural networks [45] to train over the nodes of the 
graph to classify the nodes belonging to the original circuit (C), cube stripping 
circuit (S), or restore circuit (R). The final step is to remove the nodes classified 
as part of S and R from the locked circuit obtaining the original circuit C. 


2.3 Analysis of the Structural Attacks on SFLL 


FALL and SFLLUnlock are dependent on finding the output of cube stripping 
circuit sfo. Hence, hiding/removing sfo from the locked circuit will ensure robust- 
ness against such attacks. GNNUnlock works by removing the cube stripping 
circuit and restore circuit from the SFLL locked circuit. Hence, removing/hiding 
a part or whole of the cube stripping circuit from the locked design makes the 
locked design robust to such attacks. 


3 Overview 


3.1 Preliminaries 


Attack Model. We assume that the attacker has access to a functional circuit 
(which can be used as an oracle) and knows the hamming distance (A). 


Graph Representation of Circuit. We work with the circuit in And Inverter 
Graph (AIG) format. An AIG consists of two inputs AND gates and NOT gates. 
We construct a graph G from the circuit in AIG format as follows: the gates 
in the circuit map to nodes in G. A wire (or signal) connecting gates map to 
edges on the graph. The input and output signals are marked as special nodes. 
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(a) AIG format of the original circuit C. 


Fig. 3. An example of circuit C and its corresponding graph representation. 


If not otherwise specified, we construct the graph of a circuit with the node 
representing the final output signal as the start node (we assume a single output 
bit in this paper for simplicity). Figure3b shows the graph of the circuit in 
Fig. 3a. 

The distance between two nodes (say gı and y in Fig. 3b) is the (minimum) 
number of edges in the path(s) from nodes gı to y (which is 3, in this case). We 
define the depth, d, of a node n as the maximum distance from the start node 
(y in Fig. 3b) of the graph to n. 

We define a cut on graph G as a partitioning of nodes into two disjoint (con- 
nected) subsets such that the inputs and outputs belong to distinct partitions. 
A cut is defined by a cut-set, a selection of edges (which are said to cross a cut) 
such that its endpoints are in distinct partitions. We define the depth of cut as 
the maximum amongst the depths of the nodes in the subset containing the start 
node. In the rest of the paper, we refer to cut on a circuit to refer to the cut on 
the underlying graph. The dotted red lines show a cut at depth 3 in Fig. 3. 


Notations. We show combinational circuits with n inputs X and m outputs 
Y as boolean functions Y +> C(X), where X is an n-bit vector (£1, 22,...,2n), 
and Y is an m-bit vector (y1, y2,---,Ym). We also use the functional notation, 
C(X), to denote the output of the circuit C, i.e. the signal Y. We use capital 
letters to denote bit-vectors and small letters to denote individual bits. We use 
® to denote the XOR gate and o to denote function (or circuit) composition. 

We use blackboard-bold capital letters for circuits (like C). We use ọ for 
complete SFLL locked designs and ¢ for complete SR-SFLL(0) or SR-SFLL 
locked designs. We use subscripts to denote sub-parts of a circuit. For example, 
if we use C to denote the circuit shown in Fig. 3a, we use Ca and Cy, to denote 
the subcircuits with outputs a (red block) and b (blue block). 


3.2 Approach 


Recall that the known structural attacks on SFLL exploit the structural char- 
acteristics of sfo (see Sect. 2.2). Our defense techniques attempt to synthesize 
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(a) SFLL locked circuit (blue highlighted (b) SR-SFLL(0) locked circuit (Q is syn- 
part is input for synthesis). thesized). 


Fig. 4. Transformation of SFLL locked circuit to SR-SFLL(0) locked circuit. (Color 
figure online) 


a circuit that is semantically equivalent to the original circuit but misses these 
prominent structural characteristics that make structural attacks feasible. 


SR-SFLL(0). SR-SFLL(0) identifies a cut on the FSC, through both the 
original circuit (C) and the cube stripping circuit (S), as shown by the red- 
dotted line in Fig. 1, separating the inputs (X) and the output (02) of the FSC. 
The cut-set is marked by the wires {A,V} (as shown in Fig. 4a). 

Next, it synthesizes a perturbation unit Q (as shown in Fig. 4b) such that it 
ensures the following conditions: 


— Q is semantically equivalent to the removed circuit, i.e. C2 6 Sa; 
— No wire in Q is semantically equivalent to the output of So (i.e. sfo). 


The first condition ensures soundness, that is the functioning of the new 
circuit is the same as that of the SFLL locked circuit. The second condition 
ensures security as sfo is not present in the new design, and hence, the new 
design misses all the structural characteristics (see Sect. 2.2) that made SFLL 
vulnerable to attacks. 


SYNTAK. SR-SFLL(0) is robust to existing attacks as reverse engineering 
using the sfo signal is not possible anymore. However, in contrast to exist- 
ing attacks that attempt to reverse engineer an existing locked circuit, what if 
we synthesize an alternate, semantically equivalent circuit that has a structure 
amenable to reverse engineering? Our novel attack employs a similar strategy. 
The attack attempts to recover an alternate locked design that exposes the 
XOR gate G,., (as shown in Fig. 5b), in which case it becomes easy to identify the 
sfo signal—it must be one of i or j. SYNTAK, thus, side-steps the challenge of 
reverse engineering the SR-SFLL(0) locked circuit with missing sfo by, instead, 
resynthesizing another locked circuit that has an easily identifiable sfo signal. 
This algorithm proceeds as follows: 


— cut the FSC of SR-SFLL(0) locked circuit into FSC, and FSC9; 
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(a) FSC prtitioned into FSC; and FSC2; (b) The circuit returned by SYNTAK. P is 


FSCz2 is given to the synthesis tool. the synthesized circuit. 


Fig. 5. SynrAK on SR-SFLL(0) locked circuit. 


— synthesize a new circuit P; 6 P; that is semantic equivalent to FSC». 


With sfo clearly identifiable, the existing SFLL attacks now become feasible. 

However, this attack may only succeed if the identified cut is such that FSC. 
contains the whole of Q. Hence, the attacker may have to “guess” different cuts 
(e.g. by progressively increasing the depth of the cut) till the attack succeeds. 
We say that the attack succeeds if any of the existing attacks are able to break 
the defense with the identified sfo signal in the resynthesized circuit. 

The attack is made easier by the fact that it is not required to select a cut 
that exactly isolates Q. The attack will still succeed even if some portion of Cı 
and Sı enters FSC2 (see Fig. 7b). However, the synthesis of P; 6 P; becomes 
increasingly expensive with the increasing size of FSC. 

Further, even with the “right” cut, not all synthesis candidates may yield a 
signal semantically equivalent to sfo. Hence, the attacker needs to correctly guess 
the cut as well as the correct synthesis candidate for a successful attack. However, 
our experiments demonstrate that even with these uncertainties, SYNTAK is able 
to break SR-SFLL(0) in 71.25% of cases. 


SR-SFLL. The primary reason why SYNTAK breaks SR-SFLL(0) is that we 
are able to synthesize a new circuit P; @P; such that there are two XOR gates at 
the end of the circuit. If instead, a new circuit that is synthesized introduces the 
functionality of Sz in the middle of C, SYNTAK would not have been feasible. 
Figure 2b shows our improved design for the SR-SFLL locked circuit. Instead 
of resynthesizing the circuits C2 and S2, we place a perturbation unit (Q) in 
between Cı and Cy. The perturbation unit is made to operate semantically 
equivalent to the original SFLL locked circuit. The shaded portion, consisting of 
S2 (that produces sfo) and one of the XOR gates, is eliminated from the design. 
As the attacker is unaware of the location of the perturbation unit, and as 
the perturbation unit is not at the end of the circuit, the attacker’s task gets 
more challenging: the attacker needs to synthesize a new circuit at the end of the 


SR-SFLL: Structurally Robust Stripped Functionality Logic Locking 199 


Cy 


+= G ) =j y Cı D He AD 
So Sı D D= 


D 

| 
Si : | >--'sfo | 
sro R sro | 


K R 


a 


(a) SFLL locked circuit (b) SR-SFLL(0) locked circuit (no sfo in Q) 


X 
Jm å ë an 
) 
Sı Is E 
È | HUY 
A R sro 
7 K R sro 
H 


(c) Circuit generated by SYNTAK 
on the SR-SFLL(0) locked circuit (d) SR-SFLL locked circuit. The synthesized unit 
(exposes sfo) is between the circuits Cı and C2 


Fig. 6. Illustration of SFLL, SR-SFLL(0), and SR-SFLL locked circuits along with 
SYNTAK on SR-SFLL(0) 


design with an XOR gate (that would provide access to sfo) that re-establishes 
the functionality of both Sə and C2. On the other hand, the defender only has 
to synthesize Q to re-establish the functionality of S2. 

SR-SFLL is scalable to large circuits. The scalability of SR-SFLL depends 
on the depth of the cut, as the complexity of our synthesis problem only depends 
on the circuit that is subjected to (semantically equivalent) rewriting (C2 and 
S2 in Fig. 2b). Hence, the size of the base circuit has no impact on the scalability 
of SR-SFLL. 


Example. Figure6a shows the SFLL locked version of a circuit. The 
SR-SFLL(0) locked version is shown in Fig.6b: we can see that the sfo sig- 
nal (available in the SFLL locked circuit) is not available in the SR-SFLL(0) 
locked circuit anymore; hence, it is robust to structural attacks. After applying 
SYNTAK (Fig. 6c), SYNTAK could recover the sfo signal in the synthesized cir- 
cuit. Finally, Fig. 6d shows the SR-SFLL locked circuit: it is structurally robust 
(does not include sfo) and does not succumb to SYNTAK. 

SR-SFLL provides a stronger asymmetric advantage versus SR-SFLL(0): 
in SR-SFLL(0), both the attack and the defense need to resynthesize the func- 
tionalities of Cə and Sz within Q. This prevents the defense from taking deep cuts 
for FSC to keep the task of synthesizing Q feasible. Hence, SR-SFLL(0) only 
holds the advantage of knowing the secret “cut”. On the other hand, SR-SFLL 
only needs to synthesize the functionalities of Sg while the attacker would need 
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to resynthesize the functionalities of both C and S2 to recover sfo, making the 
synthesis task overly challenging. This gives SR-SFLL a dual advantage of the 
knowledge of the secret cut as well as an asymmetric advantage in the synthesis 
task. 


4 SR-SFLL 


4.1 Problem Statement 


Given an SFLL locked circuit (X, K) (where X is the input to the circuit and 
K is the key-bits used in the circuit), synthesize a structurally robust locked 
circuit G(X, K), such that: 


(correctness) The altered circuit is semantically equivalent to the original 
SFLL locked circuit, that is, 


VX. VK. Y(X, K) = G(X, K) (1) 


(security) There does not exist any signal, z, in the altered circuit that is 
equivalent to sfo in y; that is, 


Vz. AX .Psfo(X) # P(X) (2) 


The first condition ensures that functionality is preserved, that is, the syn- 
thesized circuit @ preserves the properties of the input SFLL locked circuit y. 
The second condition ensures that structural patterns that were available to 
attackers in SFLL, made available through the sfo signal, are not available in ¢. 


4.2 Intuition: SR-SFLL 


The current synthesis tools do not scale up to the above synthesis task for 
the whole locked circuit @ (unless the locked circuit is very small). Hence, a 
straightforward implementation of the above equations is not feasible. 

Instead, we construct the circuit @ by synthesizing a “small” circuit Q that 
can be introduced within the original circuit C, with Qo Cə preserving the 
functionality of S2 @ Co. 

We use the following (simplified) description to provide the necessary intu- 
ition. Let the functionality of the original circuit (i.e. C) be denoted as f(X), 
where X are the circuit inputs. Then, let the stripped functionality circuit (i.e. 
S in Fig. 1) be denoted as g, where g is a boolean function that returns true 
if and only if it detects the protected input patterns. The functionality of the 
circuit Yo, in Fig. 1 can then be represented as: 


w S a1 63) 


We “cut” (or partition) f into two functions fı and fo, such that f = fı o 
fa. Then, we synthesize a perturbation unit (Q), with functional definition h, 
such that: 
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Algorithm 1: SR-SFLL 

1 Input: C,A; 

2 S,R — SFLL(C); 
3 C1, C2, S1, S2 — CuT({C, S}, A); 
4 Q < SYNTHESIZE(C2, S2); 
5 if Q == 1 then 
6 | return L 

7 end 

8 @<— ASSEMBLE(C,, Co, S1, R, Q); 
9 return (X, K); 


(hoho AMX) =e = (F@a(x)= IT Fe =) 


We use the definition of g (detector for protected patterns) as used in Eq. 3. 
Now, we need to ensure the equivalence of (f ® g), i.e. (fı o (f2 ® g)), with 
that of (f10ho f2). This can be ensured by simply checking for the equivalence of 
(f2®g) with that of (ho f2). If the selected fo is “small”, the task of synthesizing h 
becomes feasible. 
For simplicity, we do not assume the splitting of g in the above discussion, 
but our approach allows that. 


4.3 Methodology: SR-SFLL 


Algorithm 1 takes the original circuit (C), and a choice for the cut (A). It first 
generates an SFLL locked circuit (Line 2), thereby generating the stripped func- 
tionality circuit (S(X)), and the restore unit circuit (R(X, K)). 


Identify Cut. The circuit is “cut” (according to A) to partition the original 
circuit C to segments Cı and Cə in Line 3. Similarly, the cube stripping circuit 
S is also partitioned into S; and Sg in Line 3. 


1. The edges at which the circuit is cut in the original circuit C(X) are the 
outputs for circuit A C,(X) and the input for circuit o1 > C2(A). 

2. The edges at which the circuit is cut in the stripped functionality circuit SX) 
are the output for V © S;(X) and the input for circuit sfo > S2(V). 


Synthesize Perturbation Unit Q. We introduce a perturbation unit Q 
between C; and Co such that this modified circuit (see Fig. 2b) satisfies the 
correctness and security properties (see Sect. 4.1). 

Accordingly, we pose the synthesis conditions for Q as follows: 


VA WV. (Co(A) © So(V)) = C2o(Q(A, V)) (5) 
Vz 3A AV. Q.(A, V) # S2(V) (6) 
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Equation 5 imposes the soundness constraint that introducing Q should reinstall 
the functionality of Cz. Equation 6 is the security constraint against structural 
attacks that ensures that none of the signals (z) in Q is equivalent to sfo. 

Our algorithm is not complete, that is, our synthesis conditions are stronger 
than necessary: the signals A and V are universally quantified over all possibili- 
ties, while Q needs to satisfy these conditions only on the possible outputs from 
Cı and Sı respectively. Our formulation trades off completeness for scalability. 

If Algorithm 1 fails to synthesize a locked circuit (i.e., the algorithm 
returns L), the algorithm is run again with a different choice for the cut (A). 


Theorem 1. If Algorithm 1 succeeds (that is, does not return L), the returned 
locked circuit P(X, K) is both correct and secure. 


Proof. For the Algorithm 1 to succeed, the SYNTHESIZE function must succeed. 
SYNTHESIZE succeeds only if the synthesized Q satisfies Eq. 5 and Eq. 6. 


— Correctness. As sfo is part of S2, from Eq. 5 and Fig. 2b, Eq. 1 holds when- 
ever Eq. 5 holds. 
— Security. From Eq. 6 and Fig. 2b, if Eqn 6 holds, so must Eq. 2. 


SR-SFLL(0). In case of SR-SFLL(0), we only attempt to synthesize Q to 
replace the circuits C2 and S2 (Fig. 4b) instead of synthesizing a new circuit 
between C; and Cy. Hence, in this case, the synthesis condition reduces to: 


JQ VA WV. (C2(A) @ S2(V)) = Q(A, V) (7) 


Circuit Optimization. The circuit may be subjected to optimizations (e.g. 
using berkeley-abc [1]); however, in that case, the security check (Eq. 2) needs 
to be repeated on the optimized circuit to ensure that the optimizations did not 
restore the sfo signal. In our experiments, we did perform optimizations on our 
circuits, and in no case did the security check fail post-optimization. 


5 SYNTAK 


Algorithm 2 accepts the locked circuit ¢ to return the secret key, Ke. We use 
two hyperparameters on the number of attempts on creating cuts (ncuts) and 
enumerate synthesis candidates (nsynth). 

At Line 3, the algorithm uses structural analysis to identify the functionality 
stripped circuit FSC and the restore unit R. Identifying R is reasonably simple 
as it is the only part of the locked circuit that uses the key bits K. Hence, one 
can perform dependency analysis from the key bits to identify R (as also done 
in prior work [32, 40]). 

Next, the algorithm enters a loop to guess a suitable cut (Line 5). If a new cut 
(different than the cuts obtained so far, accumulated in cuts (Line 9)) is found, 
it attempts to enumerate synthesis candidates. For every synthesis candidate P 
(Line 12), the algorithm assembles the complete circuit (Line 17) as per Fig. 5b. 
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Algorithm 2: SYNTAK 


1 Input : ©, ncuts, nsynth; 

2 cuts — Í; 

3 FSC, R <— STRUCTANALYSE(§); 

4 while |cuts| < ncuts do 

5 | FSC1,FSC2— Cur(FSC, cuts); 
6 

7 

8 

9 


if FSC2 == L then 
| break; 

end 

cuts — cuts U {(FSCi, FSC2)}; 
10 synths ~ 9; 
11 while |synths| < nsynths do 
12 P — SYNTHESIZE(FSC3, synths); 
13 if P == L then 
14 | break; 
15 end 
16 synths — synths U {P}; 
17 p — ASSEMBLE(FSC,, P, R); 
18 K. <— ATTACKWITHSFO(y, {i, k}); 
19 if kK. # L then 
20 | return Ke; 
21 end 
22 end 
23 end 


24 return L; 


FSC FSC 
FSC, a) o O 
X | c l 
O3 y 
Q sl > 
5 
K R sro 
(a) FSC2 does not contain Q. (b) FSC2 contains Q. 


Fig. 7. SYNTAK will not succeed with (a) but may suceed with (b) (cuts shown by 
blue boxes). (Color figure online) 


Then, it launches an existing structural attack (like FALL, SFLLUnlock) with 
the signals {i,j} as potential candidates for the sfo signal (Line 18). If the 
existing attacks succeed, the respective key Ke is returned. 
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The SYNTHESIZE procedure synthesizes (i, j) < P(A), such that: 
VA. FSC2(A) = (P;(A) © P;(A)) (8) 


That is, it searches for a circuit P that is semantically equivalent to FSC such 
that it exposes the sfo signal. This imposition is due to the fact that a new XOR 
gate, Gro, (circled XOR gate in Fig. 5b), is forced on the output of P; this is an 
attempt to make the new circuit resemble the SFLL circuit in Fig. 1, on which 
the existing structural attacks are potent. 

However, the algorithm is not complete due to multiple factors: 


— The choice of the cut is crucial; the attack only works if FSC% is such that 
the perturbation unit of the locked circuit @ is a part of FSC2. The attacker 
is thus invited to the challenging task of distinguishing Q (of SR-SFLL(0)) 
in the locked circuit, @. However, the attack is made a bit easier by the fact 
that it is not required to select a cut that exactly isolates Q (Fig. 7b). The 
attack will still succeed even if some portion of Cı and Sı enters P. However, 
the synthesis phase of the attack gets expensive with the size of FSC2; thus, 
overly large FSC% will not succeed either (will fail in SYNTHESIZE). 

— Every synthesis candidate that satisfies Eq. 8 may not yield sfo: there may 
be multiple possible instantiations of P, some, where none of 7 or j is sfo. 

— The synthesis condition (Eq. 8) is overly strong: the synthesized candidate P 
is required to satisfy the condition for feasible values of A as emanating from 
FSC, and that of sro from R (in fact, A and sro are correlated as both accept 
X). However, the synthesis condition forgoes this precondition for scalability 
and universally quantifies the condition on all possible values of A and sro. 


Even with the above areas of incompleteness, SYNTAK is quite effective in 
practice: in our experiments, SYNTAK breaks 71.25% of the SR-SFLL(0) locked 
circuits. Our experiments use an incremental approach to guessing cuts that 
select cuts by progressively increasing the depth (d) of the cut in each round; 
all nodes that are at most d distance far from the output are included in FSC3. 
However, other schemes (including randomized ones) are also possible. 


6 Evaluation 


Benchmarks and Setup. We have used 10 circuits from ISCAS’85 [2] and 10 
circuits from MCNC [41]. Benchmarks were used for evaluation in most of the 
recent work, including SFLL [44], FALL [32], SFLLUnlock [40], and GNNUn- 
lock [4]. Each of these designs was locked under four different configurations 
to produce SFLL-HD locked versions: 16 and 32 key-bits, each with hamming 
distances of 2 and 4 for 16 key-bits and 4 and 8 for 32 key-bits. So, overall, we 
perform our evaluations on a benchmark suite of 80 circuit instances. 

ISCAS’85 benchmarks are available in bench and MCNC benchmarks are 
available in blif (Berkeley Logic Interchange Format) formats. We used Berkeley- 
abc to convert blif to the bench format for use by our framework. 
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Table 2. Summary of results on all our benchmarks: FL, SU, GU, and SA represent 
FALL, SFLLUnlock, GNNUnlock, and SyYNTAK respectively. Under Robustness, each 
cell in the table shows the number of locked circuits successfully broken by the respec- 
tive attack (smaller is better). Under Overhd., we show the average (AVG) and the 
standard deviation (STD) of the percentage increase in the number of AND gates in 
the AIG w.r.t. the SFLL-HD locked design (smaller is better). 


Bench. | SFLL-HD SR-SFLL(0) SR-SFLL 

Robustness Robustness Overhd. % Robustness Overhd. % 
FL SU GU] FL SU GU SA | AVG STD FL SU GU SA AVG STD 
ISCAS | 40 40 40} 0 0 0 29/017 0.10) 0 0 0 O | 0.23 0.14 
MCNC| 40 40 40/0 0 0 28 | 0.10 003/0 0 0 O | 0.12 0.05 


We have used the popular SFLL-HD variant of SFLL where the cube strip- 
ping function is the hamming distance between the input and the key bits. 

We use “cut” at depth 4 for selecting C2 both for SR-SFLL(0) and for 
SR-SFLL. For SYNTAK, we progressively increase the depth from one till the 
attack is successful; we use FALL and SFLLUnlock as the existing attacks on the 
circuit resynthesized using SYNTAK. We use a timeout of 1h for SR-SFLL(0) 
and SR-SFLL timeout of 1h; SYNTAK uses a time limit of 2 days. 

We built our synthesis engine using Berkeley-abc |1] and the Sketch [33] 
synthesizer. Sketch, is primarily designed for program synthesis. It discharges 
Quantified Boolean formulas (QBF) at the backend to be solved using Berkeley- 
abc or Minisat [12]. We found Sketch to be quite an effective tool for our problem. 

The rest of our framework is implemented in Python. We use open-source 
implementations of SFLL Unlock |40], FALL [32], and GNNUnlock [4] that were 
made available by the authors of these tools. 

We conduct our experiments on a machine with 12-Core Intel(R) Xeon(R) 
Silver CPU E5-2620 CPU @ 2.00 GHz with 32GB RAM. 


Research Questions. Our experiments were designed to answer the following 
research questions: 


1. How do the newly proposed SR-SFLL(0) and SR-SFLL compare with the 
state-of-the-art SFLL-HD on existing attacks (SAT and structural attacks)? 

2. How do SR-SFLL(0) and SR-SFLL stand to the novel SYNTAK? 

3. What is the overhead of SR-SFLL w.r.t. SFLL-HD? 


Both SR-SFLL(0) and SR-SFLL were able to defend against the exist- 
ing attacks: SAT, FALL, SFLLUnlock, and GNNUnlock. However, 71.25% of 
the benchmarks locked using SR-SFLL(0) were broken by SYNTAK, while no 
instance defended by SR-SFLL could be broken by SYNTAK. 

From the AIG of the circuits, we infer that SR-SFLL uses 0.18% (on average) 
more AND gates than SFLL-HD locked circuits. 
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Table 3. Robustness and overhead of SR-SFLL(0) and SR-SFLL with respect to 
SFLL-HD locked circuits on a subset of our benchmarks. Benchmark names starting 
with “C” are part of ISCAS, rest are part of MCNC benchmarks. The mark % indicates 
the attack is successful, YW indicates attack is not successful. 


Bench. Inst. | SFLL-HD || SR-SFLL(0) SR-SFLL 
He geile sess g H Hga a 

(kh) AR P/ZR EE) seep 2 
BAER ae | eR aS E: 

A ne) ő FES 6 

C432 (16,2) ||" # xlv vv x lowlyuv v (0.26 
i4 (32,4)| x *# lV UY voil y v (0.18 
C880 (32,8) x x eivuvY x lOllllYuY v o1 
apex2 (32,8) ||\*%# * xl|vv vy x loul vy v O11 
C499 (164) x # IlY vv #l020\\YuUY v 0.31 
C1908 (32,8)//%*% x% *# \lvYvY voul vy v loz 
C1355 (32,4) % x% xl|vv vy x oily yvy v o2 
i9 (164) x% x xę|vvvy x ouly vyv v 0.21 
i7 (6,29% * xlvvyvy v |oo vy vy v (014 
C2670 (16,2) % x% x|v v vy xosil v vy v (031 
C3540 (16,2) 1% x% xlv v vy v oily y y v 0.25 
dalu (32,4) % % x lvy vy vlounlyvyvy v 01 
fre2 (16,4) % x% xlvvy vy v loil vyv v 017 
k2 (16,2) % x% xlv uv x |oo v y v 0.08 
i8 (324) x% x xlv vy vy x |oo vy vy v 0.04 
C5315 (32,8) % x% xlv vy x loole vy vy v 0.12 
seq (32,8) % x% xlv vvy v |ooly vy y v 0.05 
C7552 (16,4)//%*% x% xl|vvvy x |oulyvy vy v loi 
C6228 (32,4) i% * xlv v vy v lool yvy v |016 
des (32,8) |" % x |v vy x jooly y vy v (0.03 


6.1 Robustness of SR-SELL(0) and SR-SELL on Existing Attacks 


Table2 provides a summary of the performance of SFLL-HD, SR-SFLL(0), 
and SR-SFLL against existing structural attacks (FALL, SFLLUnlock, and 
GNNUnlock) on a representative set of benchmarks: the table shows the number 
of instances where the respective attack break the defense. While the struc- 
tural attacks (FALL, SFLLUnlock, and GNNUnlock) are able to break all 
of these instances for SFLL locked circuits, our structurally robust proposals 
(SR-SFLL(0) and SR-SFLL) are resilient against these attacks. 

Table 3 shows the results on a representative subset of our benchmarks: % 
represents the number of instances where the locked circuit gets broken by the 
respective attack, and Y represents the number of instances where the respective 
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Table 4. Overhead of SR-SFLL(0) and SR-SFLL vs SFLL. Overhead calculated 
over SFLL-HD locked circuits shown in Table 3. Benchmark names starting with “O” 
are part of ISCAS while the rest are part of MCNC benchmarks. 


Inst. SR-SFLL(0) SR-SFLL 
Benchmark (k, h) || Original | SFLL-HD || # gates | Overhead% | # gates | Overhead% 
C432 16, 2 209 768 770 0.26 770 0.26 

i4 32, 4 246 1673 1675 0.12 1676 0.18 
C880 (32,8) 327 1754 1756 0.11 1756 0.11 
C499 16, 4 400 957 959 0.20 960 0.31 
C1908 32, 8 414 1842 1844 0.11 1846 0.11 
apex2 32, 8 445 1873 1875 0.11 1875 0.11 
C1355 32, 4 504 1931 1933 0.10 1935 0.21 
C2670 16, 2 717 1277 1281 0.31 1281 0.31 

i9 16, 4 889 1448 1450 0.14 1451 0.21 

i7 16, 2 904 1463 1465 0.14 1465 0.14 
C3540 16, 2 1038 1595 1598 0.19 1599 0.25 

frg2 16, 4 1164 1727 1726 0.12 1727 0.17 
dalu 32, 4 1371 2799 2802 0.11 2802 0.11 
C5315 32, 8 1773 3201 3203 0.06 3205 0.12 

k2 16, 2 1998 2558 2560 0.08 2560 0.08 
C7552 16, 4 2074 2634 2637 0.11 2637 0.11 
C6228 32, 4 2337 3765 3768 0.08 3771 0.16 

seq 32, 8 2411 3837 3839 0.05 3839 0.05 

i8 32, 4 3310 4737 4739 0.04 4739 0.04 

des 32, 8 4123 5551 5554 0.05 5553 0.03 


defense successfully defends against the attack. As the primary purpose for the 
design of SFLL was to be resilient against SAT attacks, it is not surprising 
that SAT attack times out on all instances of the SFLL locked designs. As 
SR-SFLL(0) and SR-SFLL are functionally equivalent to SFLL, they too are 
resilient to SAT attacks. 

We also conducted experiments with impractically small key sizes of 5 key bits 
(with hamming distance 2). None of the structural analysis based attacks (FALL, 
SFLLUnlock, and GNNUnlock) could break either SR-SFLL(0) or SR-SFLL 
locked circuits even for these small key sizes. 


6.2 Robustness of SR-SELL(0) and SR-SELL on SYNTAK 


We apply SYNTAK on SR-SFLL(0) and SR-SFLL locked circuits to evaluate 
their robustness on this attack. We “guess” the cut for SYNTAK starting with 
a cut at a depth of 1; if the synthesis phase in SYNTAK or the subsequent 
structural attack (FALL and SFLLUnlock) fails, we reattempt the attack with 
the depth increased by one. We use a timeout of 2 days for SYNTAK. 
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On our novel SYNTAK attack, SR-SFLL(0) succumbs on 71.25% of the 
cases, but SR-SFLL successfully defends against this attack on all instances. 
Table 3 shows the performance of some representative benchmarks and Table 2 
summarizes the overall results. 


6.3 Overhead of SR-SELL(0) and SR-SELL 


Table 4 shows the overhead (in terms of the number of AND gates in the AIG) 
for SR-SFLL(0) and SR-SFLL over that of SFLL on the benchmarks shown in 
Table 3. Table 2 provides a summary of the overheads over all our benchmarks. 

SR-SFLL(0) and has almost no additional overhead (average of about 
0.14%) and SR-SFLL also has a very low overhead (average of about 0.18%) 
over all our benchmarks. This is because while SR-SFLL(0) essentially rewrites 
a part of the circuit, SR-SFLL is required to insert additional machinery to sub- 
stitute the functionality of Sọ within C. 


7 Related Work 


Initial logic locking schemes [7, 10, 11] introduced additional logic and new inputs 
to the circuit design in order to get the locked circuit. These locked circuits 
work correctly when the correct secret key is provided to the circuit by the 
designers post IC fabrication. These logic locking techniques are vulnerable to 
SAT based attacks [23, 26,34]. To overcome the SAT based attacks Anti-SAT [39], 
and SARLock [42] were proposed. However, Anti-SAT was broken by SPS [43] 
attack. SARLock was broken by App-SAT [30] attack. 

SFLL-HD [44] introduces a stripped functionality approach for logic locking 
which defend against the above-mentioned attacks. But this is also vulnerable 
to the FALL [32], SFLLUnlock [40], and GNNUnlock [4]. 

HOLL [35] exploits the power of program synthesis tools to generate the 
locked circuit by using a “secret” program (using programmable logic like EEP- 
ROM) as the key. As the attacker has to synthesize the “secret” program, HOLL 
becomes challenging to break. However, the requirement of having an embedded 
programming chip makes the approach both complicated and expensive; fur- 
ther, every invocation of the circuit requires the program in the slow EEPROM 
memory to be executed. Our approach, instead, builds on the popular SFLL 
technique and does not need embedded programmable chips. 

Program synthesis has seen a significant growth in the recent years. Pro- 
gram synthesis algorithms have powered the synthesis of bit-vector programs 
[16], heap-manipulations [13,24,27], language parsers [21,31], semantic actions 
in attribute grammars [18], abstract transformers [19], automata [5], invari- 
ants [6,13, 20,22], and even differentially private mechanisms [28]. Program syn- 
thesis has also been applied to synthesize bug corpora [29] as well as for debug- 
ging [8,9], and repairing buggy programs like fixing incorrect heap manipula- 
tions [37,38], or synthesize relevant fences and/or atomic sections in concurrent 
programs under relaxed memory models [36]. 
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There exist boolean functional synthesis tools, like CADET [25], Man- 
than [14,15], and BFSS [3], that could have been used for our synthesis task. 
However, none of these tools allow us to control the “structure” of the synthe- 
sized formula. Hence, we built our synthesis engine using the Sketch synthesizer, 
which is designed for program synthesis. 


8 Conclusions 


SR-SFLL provides security against structural analysis based attacks such as 
FALL, SFLLUnlock, and GNNUnlock. The core idea used by SR-SFLL is to 
use modern synthesis engines to recover structural patterns that can be exploited 
by existing structural analysis based attacks. 

SR-SFLL provides an asymmetric advantage to the defender over the 
attacker on many counts: 


— secret key: Similar to SFLL, SR-SFLL uses a secret key to define a set 
of protected input patterns. The locked circuit behaves incorrectly when run 
with the wrong key; 

— secret cut: The cut used to partition the SFLL locked circuit (where the 
synthesized component was inserted) is known to the defender but not to the 
attacker; 

— challenging synthesis task: While the defender is required to synthesize 
a smaller circuit that only establishes the functionality of So, the attacker is 
required to synthesize a much larger circuit that reestablishes the functional- 
ities of both Cə and Sə (see Fig. 2b). 


As the perturbation unit resides within the original circuit at a location 
unknown to the attacker and has no specific structural signature, structural 
analysis of the SR-SFLL locked circuit becomes difficult. Also, as SR-SFLL 
locked circuits are functionally equivalent to the respective SFLL locked circuits 
(see Eqn 1), SR-SFLL retains all the theoretical robustness properties of SFLL. 
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Abstract. The simulation of quantum circuits on classical computers is 
an important problem in quantum computing. Such simulation requires 
representations of distributions over very large sets of basis vectors, and 
recent work has used symbolic data-structures such as Binary Decision 
Diagrams (BDDs) for this purpose. In this tool paper, we present QUASI- 
MODO, an extensible, open-source Python library for symbolic simula- 
tion of quantum circuits. QUASIMODO is specifically designed for easy 
extensibility to other backends. QUASIMODO allows simulations of quan- 
tum circuits, checking properties of the outputs of quantum circuits, 
and debugging quantum circuits. It also allows the user to choose from 
among several symbolic data-structures—both unweighted and weighted 
BDDs, and a recent structure called Context-Free-Language Ordered 
Binary Decision Diagrams (CFLOBDDs)—and can be easily extended 
to support other symbolic data-structures. 


1 Introduction 


Canonical, symbolic representations of Boolean functions—for example, Binary 
Decision Diagrams (BDDs) [5]—have a long history in automated system design 
and verification. More recently, such data-structures have found exciting new 
applications in quantum simulation. Quantum computers can theoretically solve 
certain problems much faster than traditional computers, but current quan- 
tum computers are error-prone and access to them is limited. The simulation of 
quantum algorithms on classical machines allows researchers to experiment with 
quantum algorithms even without access to reliable hardware. 

Symbolic function representations are helpful in quantum simulation because 
a quantum system’s state can be viewed as a distribution over an exponential- 
sized set of basis-vectors (each representing a “classical” state). Such a state, as 
well as transformations that quantum algorithms typically apply to them, can 
often be efficiently represented using a symbolic data-structure. Simulating an 
algorithm then amounts to performing a sequence of symbolic operations. 

Currently, there are a small number of open-source software systems that 
support such symbolic quantum simulation [1,6,8,13,16]. However, the underly- 
© The Author(s) 2023 
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ing symbolic data-structure can have an enormous effect on simulation perfor- 
mance. In this tool paper, we present QUASIMODO,! an extensible framework 
for symbolic quantum simulation. QUASIMODO is specifically designed for easy 
extensibility to other backends to make it possible to experiment with a variety of 
symbolic data-structures. QUASIMODO currently supports (i) BDDs [3,5,7], (ii) 
a weighted variant of BDDs [9,14], [19, Ch. 5], and (iii) Context-Free-Language 
Ordered Binary Decision Diagrams CFLOBDDs [11], a recent canonical repre- 
sentation of Boolean functions that has been shown to outperform BDDs in many 
quantum-simulation tasks. QUASIMODO also has a clean interface that formal- 
methods researchers can use to plug in new symbolic data-structures, which 
helps to lower the barrier to entry for formal-methods researchers interested in 
this area. 

Users access QUASIMODO through a Python interface. They can define a 
quantum algorithm as a quantum circuit using 18 different kinds of quantum 
gates, such as Hadamard, CNOT, and Toffoli gates. They can simulate the algo- 
rithm using a symbolic data-structure of their own choosing. Users can sample 
outcomes from the probability distribution computed through simulation, and 
can query the simulator for the probability of a specific outcome of a quan- 
tum computation over a set of quantum bits (qubits). The system also allows 
for a form of correctness checking: users are allowed to ask for the set of all 
high-probability outcomes and to check that these satisfy a given assertion. 

Along with QUASIMODO, we are releasing a suite of 7 established quantum 
algorithms encoded in the input language of QUASIMODO. We hope that these 
algorithms will serve as benchmarks for future research on symbolic simulation 
and verification of quantum algorithms. 


Organization. Section 2 gives an overview of quantum simulation. Section 3 gives 
a user-level overview of QUASIMODO. Section 4 provides background on the sym- 
bolic data-structures available in QUASIMODO. Section 5 describes the program- 
ming model of QUASIMODO, and presents experimental results. Section 6 con- 
cludes. 


2 Background on Quantum Simulation 


Quantum algorithms on quantum computers can achieve polynomial to exponen- 
tial speed-ups over classical algorithms on specific problems. However, because 
so far there are no practical scalable quantum computers, simulation of quan- 
tum circuits on classical computers can help in understanding how quantum 
algorithms work and scale. A simulation of a quantum-circuit computation 
[1,6,8, 11, 13,19] uses a representation qs of a quantum state and performs oper- 
ations on qs that correspond to quantum-circuit operations (gate applications 
and measurements on qs). 

Simulating a quantum circuit can have advantages compared to executing the 
circuit on a quantum computer. For instance, some quantum algorithms perform 


' QUASIMODO is available at https: //github.com/trishullab/Quasimodo.git. 
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1 import quasimodo #python package to import for Quasimodo 
2 epsilon = 1e-8 

3 

4 # number of qubits in the quantum state 

5 numQubits = 2 ** 12 

6 # initialize the quantum state 

7 qs = quasimodo.QuantumState("CFLOBDD", numQubits) 

8 qs.h(0) # Apply Hadamard gate to Qubit O 

9 for i in range(1, numQubits): 

10 qs.cx(0, i) # Apply CNOT Gate from Qubit O to Qubit i 
11 

12 qubit_mapping = {} # map from qubit number -> desired outcome 
13 for i in range(0, numQubits): 

14 qubit_mapping[i] = 1 

15 

16 # query probability of outcome as encoded in qubit mapping 
17 prob = qs.prob(qubit_mapping) 

18 if (abs(prob - 0.5)) < epsilon: 

19 print ("Circuit is correct") 

20 else 

21 print ("Incorrect circuit") 


Fig. 1. An example of a QUASIMODO program that performs a quantum-circuit compu- 
tation in which the final quantum state is a GHZ state with 4,096 qubits. The program 
verifies that a measurement of the final quantum state has a 50% chance of returning 
the all-ones basis-state. 


multiple iterations of a particular quantum operator Op (e.g., k iterations, where 
k = 2). A simulation can operate on Op itself [19, Ch. 6], using j iterations of 


repeated squaring to create matrices for Op”, Op’,..., Op” = Op". In contrast, 
a physical device must apply Op sequentially, and thus performs Op k = 2/ 
times. 

Many quantum algorithms require multiple measurements on the final state. 
After a measurement on a quantum computer, the quantum state collapses to 
the measured state. Thus, every successive measurement requires re-running the 
quantum circuit. However, with a simulation, the quantum state can be preserved 
across measurements, and thus the quantum circuit need only be executed once. 


3 Quasimodo’s Programming and Analysis Interface 


This section presents an overview of QUASIMODO from the perspective of a user 
of the Python API. A user can define a quantum-circuit computation and check 
the properties of the quantum state at various points in the computation. This 
section also explains how QUASIMODO can be easily extended to include custom 
representations of the quantum state. 


Example. Figure 1 shows an example of a quantum-circuit computation writ- 
ten using the QUASIMODO API. To use the QUASIMODO library, one needs to 
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import the package, as shown in line 1. A user can then create a program that 
implements a quantum-circuit computation by 


— Initializing the quantum state by making a call to QuantumState with an 
argument that selects the desired backend data-structure and the number 
of qubits in the quantum state. (See line 7.) The example in Fig. 1 uses 
CFLOBDD as the backend simulator, but other data-structures can be used 
by changing the backend parameter to BDD or WBDD. QuantumState sets 
the initial quantum state to the all-zeros basis-state. 

— Applying single-qubit gates to the quantum state, such as Hadamard (h), 
Pauli-X (x), T-Gate (t), and others. The qubit to which they are to be applied 
is specified by passing the qubit number. (See line 8.) 

— Applying multi-qubit gates to the quantum state, such as CNOT (cx), Toffoli 
(cecx), SWAP (swap), and others. The qubits to which they are to be applied 
is specified by passing the qubit numbers. (See line 10.) 


Note that queries on the quantum state do not have to be made only at 
the end of the program; they can also be interspersed throughout the circuit- 
simulation computation. 

QUASIMODO allows different backend data-structures to be used for repre- 
senting quantum states. It comes with BDDs [3,5,7], a weighted variant of BDDs 
[9,14], [19, Ch. 5], and CFLOBDDs [11]. QuASIMODO also provides an interface 
for new backend data-structures to be incorporated by users. All three of the 
standard backends provide compressed representations of quantum states and 
quantum gates, although—as with all variants of decision diagrams—state rep- 
resentations may blow up as a sequence of gate operations are performed. 


Quantum Simulation. Quantum simulation problems can be implemented using 
QUASIMODO by defining a quantum-circuit computation, and then invoking the 
API function measure to sample a basis-vector from the final quantum state. For 
instance, suppose that the final quantum state is [0.5 00.5 0.5 0 0 0.5 OF Then 
measure would return a string in the set {000,010,011,110} with probability 
0.25 for each of the four strings. 


Verification. As shown in line 17 of Fig. 1, QUASIMODO provides an API call to 
inquire about the probability of a specific outcome. The function prob takes as 
its argument a mapping from qubits to {0,1}, which defines a basis-vector e of 
interest, and returns the probability that the state would be e if a measurement 
were carried out at that point. It can also be used to query the probability of a 
set of outcomes, using a mapping of just a subset S of the qubits, in which case 
prob returns the sum of all probabilities of obtaining a state that satisfies S. 
For example, if the quantum state computed by a 3-qubit circuit over (qo, q1, q2) 
is [0.5 00.5 0.5000.5 0] , the user can query the probability of states satisfying 
qı = 1 ^q = 0 by calling prob(1: 1,2 : 0), which would return 0.5 (= Pr(qo = 
OAg =1Aq2=0) + Pr(qgo=1 Aq =1A qo = 0) = (0.5)? + (0.5)?). 

Given a relational specification R(x, y) and a quantum circuit y = Q(x), this 
feature is useful for verifying properties of the form “Pr[R(x, Q(x))] > 0,” where 
0 is some desired probability threshold for the user’s application. 
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Debugging Quantum Circuits. QUASIMODO additionally provides a feature to 
query the number of outcomes for a given probability. This feature is especially 
helpful for debugging large quantum circuits—large in-terms of qubit counts— 
when most outcomes have similar probabilities. 

Consider the case of a quantum circuit whose final quantum state is intended 
to be Z [1 110111 0]. One can check if the final quantum state is the one 


intended by querying the number of outcomes that have probability Z. If the 
returned value is 6, the user can then check if states 011 and 111 have probability 
0 by calling prob({0 : 0,1: 1,2: 1}) and prob({0 : 1,1: 1,2: 1}), respectively. 
The API function for querying the number of outcomes that have probability 
p + € is measurement_counts(p, €). One can also query the number of outcomes 
that have probability > p by invoking the function tail-counts(p). 

QUASIMODO’s API provides the methods get_state() and most_frequent() 
to obtain the quantum state (as a pointer to the underlying data-structure) and 
the outcome with the highest probability, respectively. 


3.1 Extending Quasimodo 


The currently supported symbolic data-structures for representing quantum 
states and quantum gates are written in C++ with bindings for Python. All 
of the current representations implement an abstract C++ class that exposes 
(i) QuantumState, which returns a state object that represents a quantum state, 
(ii) eighteen quantum-gate operations, (iii) an operation for gate composition, 
(iv) an operation for applying a gate—either a primitive gate or the result of 
gate composition—to a quantum state, and (v) five query operations. Users can 
easily extend QUASIMODO to add a replacement backend by providing an oper- 
ation to create a state object, as well as implementations of the seventeen gate 
operations and three query operations. Currently, the easiest path is to imple- 
ment the custom representation in C++ as an implementation of the abstract 
C++ class used by QUASIMODO’s standard backends. 


4 The Internals of Quasimodo 


In this section, we elaborate on the internals of QUASIMODO. Specifically, 
we briefly summarize the BDD, WBDD, and CFLOBDD data-structures that 
QUASIMODO currently supports, and illustrate how QUASIMODO performs sym- 
bolic simulation using these data-structures. For brevity, we illustrate the way 
QUASIMODO uses these data-structures using the example of the Hadamard gate, 


11 
a commonly used quantum gate, defined by the matrix H = Fa i a : 


Binary Decision Diagrams (BDDs). QUASIMODO provides an option to use 
Binary Decision Diagrams (BDDs) [3,5,7] as the underlying data-structure. A 
BDD is a data-structure used to efficiently represent a function from Boolean 
variables to some space of values (Boolean or non-Boolean). The extension of 
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(a) BDD (b) CFLOBDD (c) WBDD 


11 
4 |1 1|- @) A BDD, 


(b) a CFLOBDD, and (c) a WBDD. The variable ordering is (£o, yo), where zo is the 
row decision variable and yo is the column decision variable. 


Fig. 2. Three representations of the Hadamard matrix H = 


BDDs to support a non-Boolean range is called Multi-Terminal BDDs (MTB- 
DDs) [7] or Algebraic DDs (ADDs) [3]. In this paper, we use “BDD” as a generic 
term for both BDDs proper and MTBDDs/ADDs. Each node in a BDD corre- 
sponds to a specific Boolean variable, and the node’s outgoing edges represents a 
decision based on the variable’s value (0 or 1). The leaves of the BDD represent 
the different outputs of the Boolean function. In the best case, BDDs provide 
an exponential compression in space compared to the size of the decision-tree 
representation of the function.? Figure 2(a) shows the BDD representation of the 
Hadamard matrix H with variable ordering (xo, yo), where zo is the row decision 
variable and yo is the column decision variable. 

We enhanced the CUDD library [12] by incorporating complex numbers at 
the leaf nodes and adding the ability to count paths. 


Context-Free-Language Ordered Binary Decision Diagrams (CFLOBDDs). 
CFLOBDDs [11] are a binary decision diagram inspired by BDDs, but the two 
data-structures are based on different principles. A BDD is an acyclic finite-state 
machine (modulo ply-skipping), whereas a CFLOBDD is a particular kind of 


2 Technically, the BDD variant that, in the best case, is exponentially smaller than the 
corresponding decision tree, is called a quasi-reduced BDD. Quasi-reduced BDDs are 
BDDs in which variable ordering is respected, but don’t-care nodes are not removed, 
and thus all paths from the root to a leaf have length n, where n is the number of 
variables. However, the size of a quasi-reduced BDD is at most a factor of n+1 larger 
than the size of the corresponding (reduced, ordered) BDD [15, Thm. 3.2.3]. Thus, 
although BDDs can give better-than-exponential compression compared to decision 
trees, at best, it is linear compression of exponential compression. 
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single-entry, multi-exit, non-recursive, hierarchical finite-state machine (HFSM) 
[2]. Whereas a BDD can be considered to be a special form of bounded-size, 
branching, but non-looping program, a CFLOBDD can be considered to be a 
bounded-size, branching, but non-looping program in which a certain form of 
procedure call is permitted. 

CFLOBDDs can provide an exponential compression over BDDs and double- 
exponential compression over the decision-tree representation. The additional 
compression of CFLOBDDs can be roughly attributed to the following reasons: 


— As with BDDs, one level of exponential compression comes from sharing in a 
directed-acyclic-graph (i.e., a complete binary tree is folded to a dag). 

— In CFLOBDDs, there is a further level of exponential compression from reuse 
of “procedures”: the same “procedure” can be called multiple times at differ- 
ent call sites. 


Such “procedure calls” allow additional sharing of structure beyond what is 
possible in BDDs: a BDD can share sub-DAGs, whereas a procedure call in 
a CFLOBDD shares the “middle of a DAG”. The CFLOBDD for Hadamard 
matrix H, shown in Fig. 2(b), illustrates this concept: the fork node (the node 
with a split) at the top right of Fig. 2(b) is shared twice—once during the red 
solid path (—) and again during the blue dashed path (— - —). The corresponding 
elements of the BDD for H are outlined in red and blue in Fig. 2(a). The cell 
entry H[1][1], which corresponds to the assignment {x9 +> 1, yo +> 1}, is shown 
in Fig. 2(a) (BDD) and Fig. 2(b) (CFLOBDD) as the paths highlighted in bold 
that lead to the value os 
Weighted Binary Decision Diagrams (WBDDs). A Weighted Binary Decision 
Diagram (WBDD) [9,14], [19, Ch. 5] is similar to a BDD, but each decision 
(edge) in the diagram is assigned a weight. To evaluate the represented function 
f on a given input a (ie., a is an assignment in {0,1}”), the path for a is 
followed; the value of f(a) is the product of the weights encountered along the 
path. Consider how the WBDD in Fig. 2(c) represents Hadamard matrix H. 
The variable ordering used is (xo, yo), where xo is the row decision variable and 
yo is the column decision variable. Consider the assignment a = {zo > 1, yg > 
1}. This assignment corresponds to the path shown in red in Fig. 2(c). The 
WBDD has a weight wa at the root, which is common to all paths. The weight 
corresponding to {zo + 1} is 1 and {yo +> 1} is -1; consequently, a evaluates to 
z x lx-—l= v which is equal to the value in cell H[1][1]. 

WBDDs have been used in a variety of applications, such as verification and 
quantum simulation [19]. In the case of quantum simulation, the weights on the 
edges of a WBDD are complex numbers. Additionally, the weight on the left- 
hand edge at every decision node is normalized to 1; this invariant ensures that 
WBDDs provide a canonical representation of Boolean functions. We use the 
MQT DD package [19] for backend WBDD support. As distributed, MQT DD 
supports at most 128 qubits; we modified it to support up to 231 qubits. 
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Symbolic Simulation. A symbolic simulation of a quantum circuit-computation 
[11,13,19] uses a symbolic representation qs of a quantum state and performs 
operations on qs that correspond to quantum-circuit operations. 


— A quantum state of n qubits is a vector of size 2” x 1. Its entries are 
called amplitudes, and the vector represents the probability distribution given 
by the squares of the absolute values of the amplitudes. In QUASIMODO, 
CFLOBDDs, BDDs, and WBDDs are used to represent functions of the form 
f : {0,1}" — C—i.e., f is a vector holding complex amplitudes. 

— A quantum gate performs a linear transformation of a quantum state. 
Quantum-gate application is implemented by using a CFLOBDD, BDD, or 
WBDD to represent the matrix describing the quantum gate, and performing 
a matrix-vector multiplication ([11, Sect. 7.6-Sect. 7.7], [3]) of the gate matrix 
and the quantum state. 

— For CFLOBDDs, BDDs, and WBDDs, operations like prob, measurement_ 
counts, and tail_counts are implemented as exact operations—i.e., no 
sampling—via projection and path-counting operations ([11, Sect. 7.8], [5]). 
For CFLOBDDs and BDDs, QUASIMODO computes prob via an efficient path- 
counting operation [11, Sect. 7.8.1 and Sect. 10.1.2, respectively] to obtain the 
number of paths leading to each terminal value, and then projects the result 
onto the variables of interest (as specified by the user). QUASIMODO then 
returns the sum of the probabilities of the remaining paths. In the case of 
WBDDs as the backend, QUASIMODO computes the probability of every node 
({19, Ch. 5]) instead of counting paths. To compute measurement_counts, 
QUASIMODO returns the number of paths that lead to the requested prob- 
ability value within the provided threshold e. On querying tail_counts, 
QUASIMODO returns the number of paths that lead to terminal values having 
probability prob > p, where p is the requested probability. 

— Once path-counts are computed, a measurement from the CFLOBDD, 
BDD, or WBDD symbolic representation of a quantum state is a data- 
structure traversal that can be carried out in time proportional to 
O(max(number of qubits in the circuit, size of argument CFLOBDD)) 


5 Experiments 


In this section, we present some experimental results from using QUASIMODO on 
seven quantum benchmarks, Greenberger-Horne-Zeilinger state creation (GHZ), 
Bernstein-Vazirani algorithm (BV), Deutsch-Jozsa algorithm (DJ), Simon’s algo- 
rithm, Grover’s algorithm, Shor’s algorithm (2n + 3 qubits circuit by [4]), and 
application of the Quantum Fourier Transform (QFT) to a basis state, for dif- 
ferent numbers of qubits. Columns 2—4 of Table 1 show the time taken for run- 
ning the benchmarks with CFLOBDDs, BDDs (CUDD 3.0.0 [12]), and WBDDs 
(MQT DD v2.1.0 [17]). For each benchmark and number of qubits, we created 
50 random oracles and report the average time taken across the 50 runs. For 
each run of each benchmark, we performed a measurement at the end of the 
circuit computation and checked if the measured outcome is correct. We ran all 
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of the experiments on AWS machines: t2.xlarge machines with 4 vCPUs, 16GB 
memory, and a stack size of 8192KB, running on an Ubuntu OS. 

One sees that CFLOBDDs scale better than BDDs and WBDDs for the 
GHZ, BV, and DJ benchmarks as the number of qubits increases. BDDs perform 
better than CFLOBDDs and are comparable to WBDDs for Simon’s algorithm, 
whereas WBDDs perform better than BDDs and CFLOBDDs for QFT, Grover’s 
algorithm, and Shor’s algorithm. 

We noticed that the BDD implementation suffers from precision issues; i.e., if 
an algorithm with a large number of qubits contains too many Hadamard gates, 
it can lead to extremely low-probability values for each basis state, which are 
rounded to 0, which in turn causes leaves that really should hold different minis- 
cule values to be coelesced unsoundly, leading to incorrect results. To overcome 
this issue, one needs to increase the floating-point precision of the floating-point 
package used to represent BDD leaf values. We increased the precision at 512 
qubits (*) and again at 2048 qubits (**). 

Part of these results are similar to the work reported in [11]; however, that 
paper did not use QUASIMODO. The results of the present paper were obtained 
using QUASIMODO, and we also report results for WBDDs, as well as BDDs and 
CFLOBDDs (both of which were used in [11]). The numbers given in Table 1 are 
slightly different from those given in [11] because these quantum circuits exclu- 
sively use gate operations that are applied in sequence to the initial quantum 
state. One can rewrite the quantum circuit to first compute various gate-gate 
operations (either Kronecker product or matrix-multiplication operations) and 
then apply the resultant gate to the initial quantum state. For example, consider 
a part of a circuit defined as follows: 


for i in range(0, n): 
qc.cx(i, n) 


Instead of applying CNOT (cx) sequentially for every i, one can construct a 
gate equivalent to cx_op = IT ae ca(i,n) and then apply cx_op to quantum state 
qc as follows: 


cx_op = qc.create_cx(0, n) 
for i in range(1, n): 

tmp = qc.create_cx(i, n) 

cx_op = qc.gate_gate_apply(cx_op, tmp) 
qc.apply_gate(cx_op) 


QUASIMODO supports such operations as Kronecker product and matrix 
product of two gate matrices. [11] uses such computations for both oracle con- 
struction and as part of the quantum algorithm. Table 2 shows the results on 
GHZ, BV, and DJ algorithms using the same circuit and oracle construction 
used in [11]. However, Simon’s algorithm, Grover’s algorithm, and Shor’s algo- 
rithm in [11] use operations outside QUASIMODO’s computational model, and 
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Table 1. Performance of CFLOBDDs, BDDs, WBDDs using QUASIMODO; and other 
simulators like MQT DDSim, Quimb, and Google Tensor Network (GTN) 


Benchmark #Qubits | CFLOBDD BDD WBDD MQT DDSim Quimb GTN 
Time (sec) Time (sec) Time (sec) Time (sec) Time (sec) Time (sec) 

GHZ 8 0.03 0.007 0.008 0.065 0.255 0.003 

16 0.03 0.008 0.011 0.068 0.368 0.010 

32 0.031 0.008 0.017 0.074 0.932 Memory Error 

64 0.032 0.012 0.03 0.087 3.16 

128 0.035 0.026 0.06 0.116 12.1 

256 0.041 0.1 0.134 Not Supported Memory Error 

512 0.053 0.552 0.35 

1024 0.078 3.01 1.05 

2048 0.13 18.8 3.59 

4096 0.239 129.92 13.33 
BV 8 0.037 0.007 0.007 0.068 0.288 0.005 

16 0.045 0.009 0.009 0.072 0.461 0.017 

32 0.06 0.013 0.012 0.082 1.21 Memory Error 

64 0.095 0.033 0.019 0.105 4.64 

128 0.17 0.116 0.036 Not Supported 20.72 

256 0.33 0.42 0.082 Memory Error 

512* 0.68 2.12 0.235 

1024 1.43 10.65 0.753 

2048** |3.1 Timeout (15 min.) | 2.76 

4096 6.78 10.77 
DJ 8 0.037 0.007 0.009 0.069 0.401 0.008 

16 0.045 0.01 0.012 0.075 0.873 0.034 

32 0.06 0.016 0.019 0.087 2.97 Memory Error 

64 0.092 0.042 0.036 0.115 8.63 

128 0.16 0.17 0.082 Not Supported 43.53 

256 0.3 0.72 0.235 Memory Error 

512* 0.6 3.9 0.753 

1024 1.22 20.92 2.76 

2048** |2.55 Timeout (15 min.) | 10.77 

4096 5.55 43.94 
Simons Alg. 4 0.05 0.014 0.008 0.064 0.272 0.004 

8 0.076 0.043 0.015 0.101 0.653 0.02 

16 Timeout (15 min.) | 9.8 8.89 1.267 2.56 Memory Error 

32 Timeout (15 min.) | Timeout (15 min.) | Timeout (15 min.) | 17.34 

64 267 
QFT 4 0.03 0.007 0.007 0.064 0.023 0.004 

8 0.04 0.043 0.009 0.068 0.035 0.012 

16 182.34 4.98 0.013 0.103 0.074 0.438 

32 Timeout (15 min.) | Timeout (15 min.) | 0.027 0.154 0.231 Memory Error 

64 0.104 0.363 1.64 

128 0.498 Not Supported 10.32 

256 2.73 103.65 

512 17.54 Timeout (15 min.) 

1024 148.5 
Grovers Alg. 4 0.055 0.015 0.019 0.239 Memory Error Memory Error 

8 1.62 6.55 0.013 0.145 

16 Timeout (15 min.) | Timeout (15 min.) | 0.369 2.45 

32 Timeout (15 min.) | Timeout (15 min.) 
Shor’s Alg. (15, 2) |4 Timeout (15 min.) | Timeout (15 min.) | 0.034 2.83 Timeout (15 min.) | Timeout (15 min.) 
Shor’s Alg. (21, 2) |5 Timeout (15 min.) | Timeout (15 min.) | 0.252 9.35 Timeout (15 min.) | Timeout (15 min.) 
Shor’s Alg. (39, 2) |5 Timeout (15 min.) | Timeout (15 min.) | 0.766 21.94 Timeout (15 min.) | Timeout (15 min.) 
Shor’s Alg. (69, 4) Timeout (15 min.) | Timeout (15 min.) | Timeout (15 min.) | 204.08 Timeout (15 min.) | Timeout (15 min.) 
Shor’s Alg. (95, 8) Timeout (15 min.) | Timeout (15 min.) | Timeout (15 min.) | 192.05 Timeout (15 min.) | Timeout (15 min.) 
Shor’s Alg. (119, 2) |8 Timeout (15 min.) | Timeout (15 min.) | Timeout (15 min.) | 206.62 Timeout (15 min.) | Timeout (15 min.) 


the results on these benchmarks differ from [11]. (Note that the results reported 
in Table 2 do not include the time taken for the construction of the oracle.) 


We also compared QUASIMODO with three other quantum-simulation tools: 
MQT DDSim [18], Quimb [8], and Google Tensor Network (GTN) [10]. MQT 
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Table 2. Performance of CFLOBDDs, BDDs, WBDDs using QUASIMODO on an alter- 
nate circuit implementation of GHZ, BV, DJ algorithms 


Benchmark | #Qubits | CFLOBDD | BDD WBDD 
Time (sec) | Time (sec) Time (sec) 
GHZ 8 0.03 0.008 0.009 
16 0.03 0.01 0.011 
32 0.034 0.035 0.017 
64 0.036 0.194 0.032 
128 0.04 1.47 Precision Issue 
256 0.05 11.77 
512 0.07 Timeout (15 min.) 
1024 0.11 
2048 0.19 
4096 0.36 
BV 8 0.001 0.001 0.001 
16 0.001 0.001 0.001 
32 0.002 0.006 0.001 
64 0.003 0.025 0.001 
128 0.005 0.089 Precision Issue 
256 0.009 0.46 
512 0.015 Timeout (15 min.) 
1024 0.027 
2048 0.049 
4096 0.086 
DJ 8 0.005 0.001 0.001 
16 0.005 0.002 0.001 
32 0.005 0.006 0.001 
64 0.006 0.025 0.001 
128 0.006 0.084 Precision Issue 
256 0.007 0.43 
512 0.008 Timeout (15 min.) 
1024 0.01 
2048 0.013 
4096 0.019 


DDSim is based on WBDDs (using MQT DD), whereas Quimb and GTN are 
based on tensor networks. Their performance is shown in columns 6-8 of Table 1. 
Note that MQT DDSim does not support more than 128 qubits. 
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6 Conclusion 


In this paper, we presented QUASIMODO, an extensible, open-source framework 
for quantum simulation using symbolic data-structures. QUASIMODO supports 
CFLOBDDs and both unweighted and weighted BDDs as the underlying data- 
structures for representing quantum states and for performing quantum-circuit 
operations. QUASIMODO is implemented as a Python library. It provides an API 
to commonly used quantum gates and quantum operations, and also supports 
operations for (i) computing the probability of a measurement leading to a given 
set of states, (ii) obtaining a representation of the set of states that would be 
observed with a given probability, and (iii) measuring an outcome from a quan- 
tum state. 
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Abstract. This paper proposes an automated method to check the cor- 
rectness of range analysis used in the Linux kernel’s eBPF verifier. We 
provide the specification of soundness for range analysis performed by 
the eBPF verifier. We automatically generate verification conditions that 
encode the operation of the eBPF verifier directly from the Linux kernel’s 
C source code and check it against our specification. When we discover 
instances where the eBPF verifier is unsound, we propose a method to 
generate an eBPF program that demonstrates the mismatch between the 
abstract and the concrete semantics. Our prototype automatically checks 
the soundness of 16 versions of the eBPF verifier in the Linux kernel ver- 
sions ranging from 4.14 to 5.19. In this process, we have discovered new 
bugs in older versions and proved the soundness of range analysis in the 
latest version of the Linux kernel. 


Keywords: Abstract interpretation © Program verification - Program 
synthesis - Kernel extensions - eBPF 


1 Introduction 


Extended Berkeley Packet Filter (eBPF) enables the Linux kernel to be extended 
with user-developed functionality. Historically, eBPF has its roots in a domain- 
specific language for efficient packet filtering [53], wherein a user can write a 
description of packets that must be captured by the network stack. In its modern 
form, eBPF is an in-kernel register-based virtual machine with a custom 64-bit 
RISC instruction set. eBPF programs can be Just-in-Time (JIT) compiled to 
the native processor hardware with access to a subset of kernel functions and 
memory. Programs written in eBPF are widely used in the industry, e.g. for load 
balancing [10], DDoS mitigation [38], and access control [12]. 


eBPF Verifier. A user should be able to attach expressive programs within 
the operating system, while ensuring that they are safe to run. For this pur- 
pose, Linux has a built-in eBPF verifier [11] which performs a static analysis 
of the eBPF program to check safety properties before allowing the program 
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Fig. 1. Agni’s methodology for automatically checking the correctness of the eBPF 
verifier on each commit. When we find the kernel to be unsound, we generate an eBPF 
program (i.e., a POC) highlighting the mismatch between abstract and concrete seman- 
tics. When we are not able to generate a POC, kernel requires a manual verification. 


to be loaded. Given that the verifier is executed in a production kernel, any 
bug in the verifier creates a huge attack surface for exploits [50,51,62,66] and 
vulnerabilities [1—9, 23-26, 35, 43-45]. 


Abstract Interpretation in the Kernel. The verifier, among other things, 
tracks the values of its variables which it subsequently uses to deem memory 
accesses to the kernel data structures to be safe. The eBPF static analyzer 
employs abstract interpretation [33] with multiple abstract domains to track the 
types, liveness, and values of program variables across all executions. It uses five 
abstract domains to track the values of variables (i.e., value tracking); four of 
them are variants of interval domains and the other is a bitwise domain named 
tnum [55,57,65,71]. The kernel implements abstract operators for each of these 
domains efficiently. Unlike traditional sound composition of sound operators typ- 
ically done with abstract interpretation (i.e., modular reduced products) [31], the 
abstract operators are composed in a non-modular fashion. Specifically, the ker- 
nel mixes up the implementation of abstract operators in one domain with reduc- 
tion operators that combine information across domains (Sect.3, see Fig. 2(d)). 
Further, the Linux kernel does not provide any soundness guarantees for these 
operators. This makes the task of verification challenging because each abstract 
domain’s correctness individually does not necessarily imply the correctness of 
their composition. To the best of our knowledge, there are no existing sound 
reduction operators for the abstract domains in the kernel. 


This Paper. We propose an automated verification approach to check the sound- 
ness of the eBPF verifier for value tracking. To perform soundness checks on every 
kernel commit, we automatically generate a formula representing the actions of 
the abstract operator from the verifier’s C code rather than manually writing them 
(Sect. 5). Figure 1 illustrates our workflow. We develop a general correctness spec- 
ification to determine when a non-modular abstract operator that combines mul- 
tiple domains is sound (Sect. 4.1). When we checked the validity of the formula 
generated from recent versions of the verifier with the correctness specification, 
we found that the verifier is unsound. We discovered that the verifier avoids man- 
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ifesting these soundness bugs through a shared reduction operator that precondi- 
tions the input abstract values (Sect. 4.2). Refining our correctness specification 
revealed that recent versions of the verifier are indeed sound. 

When our refined soundness check fails, we generate a concrete eBPF pro- 
gram that demonstrates the mismatch between abstract values maintained by 
the verifier and the concrete execution of the eBPF program using program syn- 
thesis methods (Sect. 4.3). We call our approach differential synthesis because it 
generates programs that exercise the divergence between abstract verifier seman- 
tics and concrete eBPF semantics in unsound kernels. 


Prototype and Results. We have used our prototype, Agni [18,72]., to auto- 
matically check the soundness of 16 kernel versions starting from 4.14 to 5.19. 
In this process, we have discovered 27 previously unknown bugs, which have 
been subsequently fixed by unrelated patches. For each unsound verifier, we 
have generated an eBPF program with at most three instructions that shows 
the mismatch between the semantics in ~ 97% of the cases. The eBPF programs 
highlighting the mismatch are smaller than previously known ones. We have also 
shown that the newer versions of the kernel verifier are sound with respect to 
value tracking. The source code for our prototype is publicly available [18,72]. 


2 Background on Abstract Interpretation 


Abstract interpretation is a form of static analysis that uses abstract values from 
an abstract domain to represent sets of values of program variables. For example, 
in the interval domain, the abstract value [x,y], with z, y € Z,x < y, tracks the 
set of concrete values {z € Z| a < z < y}. Abstract operators concisely represent 
the impact of the program’s operations over its variables in the abstract domain. 


Abstract Domains, Concretization, and Abstraction. Formally, concrete 
values form a partially ordered set (poset) with elements C and ordering relation 
Ee. The concrete poset is C £ 27 (i.e., power set of integers) with the ordering 
relationship Ec being the subset relationship C. An abstract domain is also 
a poset, with a set of elements A and ordering relation Ca. A concretization 
function y: AC, takes an abstract value a € A and produces concrete values 
c€C. For example, the interval domain uses the abstract poset A = Z x Z with 
the ordering relation [x,y] Ea [a,b] & (a < x) A (b > y). 

An abstraction function a: C > A, takes a concrete value c € C and produces an 
abstract value a € A. For example, in the interval domain, abstracting the concrete 
value {1,4,6} produces a({1,4,6}) = [1,6]. Concretizing [1,6] yields y([1,6]) = 
{1,2,3,4,5,6}. As seen in this example, the abstraction of a concrete value may 
over-approximate it to maintain concise representation in the abstract domain. A 
value a € Aisa sound abstraction of c € C if c Ec y(a). For a sound abstraction a 
of c, the smaller the concrete value (a), the higher the precision of the abstraction. 


Abstract Operators. Intuitively, abstract operators capture the computation 
of concrete operators over program variables in the abstract domain. For exam- 
ple, in the range domain, the action of concrete unary negation —c(-) may be 
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abstracted by —a([z,y]) = [-y, —a]. Consider a concrete operation f: Zn > Zn 
on a single program variable that is an n-bit value. We can lift f point-wise to 
any set c € C, where f(c) = {f(z) | z € c}. An abstract operator g: A— A is a 
sound abstraction of f if Va € A: f(y(a)) Ec y(g(a)). 


Galois Connection. Abstraction and concretization functions (a, y) are said to 
form a Galois connection if: (1) œ is monotonic (i.e. x Ec y a(x) Ea a(y)), 
(2) y is monotonic (a Ea b => qla) Ec 7(0)), (3) yoa is extensive (i.e. 
Vc € C : c Ec 7(a(c))), and (4) acy is reductive (i.e. Va € A: a(y(a)) Ea a) [56]. 
The Galois connection is denoted as (C, Ec) = (A, Ea). The existence of 
a Galois connection enables reasoning about the soundness and the precision of 
any abstract operator. It is in principle possible to compute a sound and precise 
abstraction of any concrete operator f through the composition ao foy. However, 
it is computationally expensive, due to the evaluation of the concretization y. 


Combining Multiple Abstract Domains Through Cartesian Product 
[31]. Suppose we are given two abstract domains (sets A1,A2) with sound 
abstraction functions a1, az and concretization functions yA1, ya2. The Carte- 
sian product abstract domain uses the set P & A, x Ag, and the order- 
ing relationship applied separately to each domain: (a; Eai bı) A (a2 Eag 
b2) = (a1,@2) Ep (b,,b2). The concretization function intersects the results 
obtained from concretizing each element in its respective abstract domain: 
yp(a1,a2)  yai(a1) N Ya2(a2). For a concrete value c € C, the abstraction 
functions are applied domain-wise and combined: ap(c) = (ası (c), aa2(c)). The 
Cartesian product domain enjoys a Galois connection (C, Ec) = (P, Ep) build- 
ing on the Galois connections of its component abstract domains. 

For example, consider the interval domain (A1, Ea: defined as above) and 
the parity domain (Ag = {1,odd,even, T} with ordering relationships L C42 
odd,even Ego T). Suppose at some point the two interpretations produce 
abstract values [3,5] and even in the two domains. The concretization of the 
Cartesian product abstract value ([3,5], even) produces the set {4}, which is 
smaller than the concretizations of either abstract value [3,5] or even in their 
respective domains. However, since the abstraction functions are applied domain- 
wise, such information cannot be propagated to the abstract values themselves. 
For example, it is desirable to propagate information from the abstract value 
even in Az to reduce the interval to [4,4] in Aj. 


Reduced Products. Intuitively, we wish to make an abstract value in one 
domain more precise using information available in an abstract value in a differ- 
ent domain. Suppose we are given an abstract value (a1, a2) from the Cartesian 
product domain. A reduction operator [34] attempts to find the smallest abstract 
value (a1,a%) such that its concretization is the same as that of (a1, a2), i.e. 
ya1(@1) N Ya2(a@2). Formally, the reduction operator p:P— FP is defined as the 
greatest lower bound of all abstract values whose concretization is larger than 
that of the given abstract value, 
i.e. p(a1,a2) [p {(a4,a5) | (a1, a2) Ec (a4, a4)}. 
However, this definition is impractical to compute even on finite domains. 
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In general, more “relaxed” versions of reduction operators may be designed 
to improve precision with efficient computation. For example, Granger [40] 
introduces a set of reduction operators p1,p2 to reduce each abstract domain 
in turn, using information from the other, until a fixed point. The operator 
pı: Ay x Ag > A; reduces the abstract value in domain Aj, while p2: Ay x A2 > Ag 
reduces that in Ag. The reduction using pı is sound if Vay E€ Aj,ag E Ag: 
yp(pi(@1, a2), a2) = yp(a1, a2) (preserve concrete values in the intersection) and 
pı(aı, a2) Cai a, (improve precision). Similarly, reduction using pọ is sound if 
Vai € Ai, a2 € Ag : yp(a1, p2(a1, @2)) = yr(a1, a2) and p2(a1, a2) Eas a2. 


3 Abstract Interpretation in the Linux Kernel 


The Linux kernel implements abstract interpretation to check the safety of eBPF 
programs loaded into the kernel. The kernel’s algorithms are encoded into a 
component called the eBPF verifier, which is a part of the pre-compiled oper- 
ating system image. The Linux kernel uses several abstract domains to track 
the type, liveness, and values of registers and memory locations used by eBPF 
programs. Among these, the abstract domains used by the kernel to track values 
are critical since they are used to guard statically against malicious programs 
that may access kernel memory. In Linux kernel v5.19 (latest as of this writing), 
these analyses constitute roughly 2100 lines of source code in the eBPF verifier. 
Implementing such analyses soundly in the kernel is challenging. This part of 
the verifier has been a source of several high-profile security vulnerabilities [1— 
9, 23-26, 35, 43-45] and exploits [50, 51,62, 66]. 

The Linux kernel uses five abstract domains for value tracking, including 
intervals in unsigned 64-bit (u64), unsigned 32-bit (u32), signed 64-bit (s64), 
signed 32-bit (s32), and tri-state numbers (tnum [61,71]). The kernel does not 
provide a formal specification of their abstraction or concretization functions, or 
proofs of soundness of the abstract operators. Below, we illustrate the abstract 
domains used in the Linux kernel with the unsigned 64-bit interval domain u64 
and tristate numbers tnum. 


The u64 Domain. The u64 abstract domain tracks an upper and lower bound 
of a 64-bit register interpreted as an unsigned 64-bit value. The eBPF verifier 
maintains the abstract u64 value as part of its static state for each register. 
Figure 2(a) provides a simplified C source code for abstract addition in the u64 
domain. The operator takes two abstract values inl and in2, with the two com- 
ponents of each abstract value denoted by the members u64_min and u64_max. The 
output abstract value is stored in out. Here, U64_MAX is the largest 64-bit non- 
negative integer. The first if condition detects if integer overflows may occur as 
a result of addition. If there is overflow, the analysis loses all precision, setting 
the 64-bit bounds of the result to the largest abstract value, [0, U64_MAx]. If there 
is no overflow (else clause), out is set to the component-wise sum of the bounds 
of int and in2, similar to unbounded bit-width interval arithmetic [32]. 
Formally, the abstract domain is Auea = {[z,y] | (x,y € Za) A (£ <ue4 
y)}, where Zg; is the set of 64-bit non-negative integers, and <,64 repre- 
sents a 64-bit unsigned comparison. The ordering relationship is (xı >u64 
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if (Cin1.u64_min + in2.u64_min < Domain 1 Domain 2 Domain 1 Domain 2 


in2.u64_min) || 
Cin].u64_max + in2.u64_max < Abstract Abstract Abstract Abstract 
in2.u64_max)) { operands operands operands operands 


out.u64_min = ð; 
out.u64_max U64_MAX; 4 
} else { Abstract Abstract Abstract 
out.u64_min = in1.u64_min + operator operator operator 
in2.u64_min; 


out.u64_max = inl.u64_max + 
in2.u64_max; 


Abstract Abstract Abstract 
result result result 


Reduction operator Reduction/abstract operator 


(a) 


out.u64_min = out. tnum.v; Domain 1 Modular Domain 1 
out.u64_max = min(in1.u64_max, Abstract rediced Abstract One-shot 
in2.u64_max); result product result reasoning 


(b) (c) (d) 


Fig. 2. Excerpts (simplified) from the kernel’s implementation of the abstract opera- 
tors for (a) addition (from the function scalar_min_max_add [14]), and (b) bitwise AND 
(from scalar_min_max_and [15]). (c) Example of reduced product abstract interpreta- 
tion where one may use inductive assertions on abstract operators from each domain, 
along with the soundness of reduction operators, to reason about the correctness of 
the overall abstraction. The greyed boxes show modular reasoning about components 
within the boxes. (d) In the Linux kernel, it is challenging to reason modularly about 
the correctness of abstract operators in each domain independently from their pairwise 
reductions, since the implementation combines abstraction with reduction. Proving 
soundness requires one-shot reasoning about all operations together. 


z2) A (y1 Susa Y2) & [21,91] Eusa [£2, Y2]. The concretization function is 
Yusa(lz,y]) = {z | (2 € ZH) A (£ Susa Z Susa y)}. The abstraction function 
is &us4(c) = [minuea(c), MaTus4(c)], where c is a member of the powerset of Zg}, 
and minyea(-) and Mazugs4(:) compute the minimum and maximum over a finite 
set c where each element of c is interpreted as a 64-bit unsigned value. 


Tristate Numbers (tnums). This abstract domain in the Linux kernel tracks 
which bits of a variable are known to be 0, known to be 1, or unknown across 
executions of the program. This domain is similar to bitwise domains [55, 57, 65]. 
However, the kernel implements this abstract domain efficiently with a tuple of 
two unsigned integers (v,m). If m for a particular bit is 1, then the value of that 
bit is unknown. If m for a particular bit is 0, then value of that bit is equal to 
v’s value for the particular bit. More formally, the abstraction function (a;) is 
written using two other functions defined as follows: ag(C) = &{c | c € C}; and 
a) (C) = |{e| c E C}. Then, a(C) = (ag(C), ag(C)*a)(C)). The concretization 
function is written as: (P) = %4((P.v, P.m)) = {c € ZG, | c & P.m= Pv} [71]. 


Abstract Operators In The Linux Kernel and Challenges in Prov- 
ing their Correctness. The Linux kernel implements an abstract operator in 
each abstract domain for each arithmetic and logic (ALU) instruction and each 
jump instruction in the eBPF instruction set.! The kernel verifier also provides 


1 The ALU instructions include 32 and 64-bit add, sub, mul, div, or, and, lsh, 
rsh, neg, mod, xor, arsh and the jump instructions include 32 and 64-bit ja, jeq, 
jgt, jge, jlt, jle, jset, jne, jsgt, jsge, jslt, jsle [13]. 
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functions to propagate information between the abstractions (reductions). How- 
ever, it does not provide formal underpinnings, e.g. Galois connections. The over- 
all analysis appears to be a Reduced Product abstract interpretation (Sect. 2). 

However, the key challenge in proving soundness is that the kernel’s operators 
combine abstraction with reduction. Consider the excerpt in Fig. 2(b) from the 
implementation of the bitwise AND operation in the u64 abstract domain in 
the kernel, simplified for clarity. As before, in1 and in2 correspond to the input 
abstract values, and out to the output abstract value. The members with names 
tnum.» denote the components of the abstract tnum. Before the execution of these 
two lines, the tnum abstract output out.tnum.v has already been computed. In 
the first line, the lower bound of the u64 result, out.u64_min is updated using the 
output abstract value in a different domain (out.tnum.v). Hence, the operation 
overall is not (merely) an abstract operator in the u64 domain. In the second 
line, the output abstract state out.u64_max is updated using the abstract inputs 
in the u64 domain. Reduction operators consume abstract outputs, not inputs. 
Hence, the operation overall is not a reduction operator either. 

These characteristics apply not just to the kernel’s bitwise AND operation 
in the u64 domain. Figure2(d) shows the structure of several of the kernel’s 
abstract operators, compared against the typical structure of product domains 
and reduction operators (Fig. 2(c)). The kernel’s algorithms combine abstrac- 
tion with reduction, making it challenging to prove their soundness in a mod- 
ular fashion. Instead, we must resort to a “one-shot” approach, which attempts 
to prove the soundness of the abstraction of an operator in one domain and 
the reductions across domains together. We call the kernel’s abstract operators 
abstraction/reduction operators in the rest of this paper. 


4 Automatic Verification of the Kernel’s Algorithms 


Given the non-modular structure of the kernel’s abstract algorithms (Sect. 3), 
we cannot use traditional methods to prove their soundness, i.e. by showing the 
soundness of each domain and the reductions separately. Further, the kernel’s 
algorithms have been evolving continuously with the inclusion of new features 
to the eBPF run-time environment. We want our methods to be applicable to 
every new update and commit to the Linux kernel. 

Hence, our goal is to perform automatic verification using SMT solvers to 
prove the soundness of (or find bugs in) the C implementation of Linux’s abstrac- 
tion/reduction operators. We work with the input-output semantics of the ker- 
nel’s abstraction/reduction operators in first-order logic extracted automatically 
from the kernel’s C source code (details of the extraction deferred to Sect. 5). 


Overview of Our Approach. We develop generic soundness specifications for the 
Linux kernel’s abstraction/reduction operators, handling arithmetic, logic, and 
branching instructions (Sect. 4.1). We find that several kernel operators violate 
these soundness specifications. However, many of these violations flag latent bugs 
in the kernel’s algorithms—bugs which are not necessarily manifested in concrete 
program executions. We observe that the kernel includes a shared “tail” of com- 
putation in all of its abstraction/reduction operators. We use this shared compu- 
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tation to refine our soundness specification by preconditioning the input abstract 
states (Sect. 4.2). This refinement enables proving the soundness of several of the 
kernel’s operators. However, it still identifies many potential violations of sound- 
ness in the kernel. We present a method based on program synthesis to generate 
loop-free eBPF programs that manifest the bugs identified by the soundness spec- 
ifications, automatically producing programs that have divergent concrete and 
abstract semantics. We call this method differential synthesis (Sect. 4.3). 

Figure 1 illustrates our entire workflow. Starting from the Linux kernel source 
code, our techniques produce concrete eBPF programs that manifest soundness 
bugs in the kernel’s algorithms. We have used this procedure to prove the sound- 
ness of multiple Linux kernel versions, discovered previously unknown soundness 
bugs (i.e. no CVEs assigned, to our knowledge), with validated proof-of-concept 
programs triggering those bugs. 


4.1 Soundness Specification for Abstraction/Reduction Operators 


We present verification conditions that are sufficient to assert the soundness of 
abstraction/reduction operators in the Linux kernel. 


Preliminaries. Encoding Soundness for a Single Abstract Domain in 
SMT. We describe how to encode the soundness condition for an abstract oper- 
ator of two operands as an SMT formula, since most eBPF instructions take two 
operands. Suppose f:C x C—C is a binary concrete operation (e.g. 64-bit addi- 
tion) over the concrete domain (e.g. C = 2234). Suppose the operator g: AxA— A 
abstracts f. Operator g is sound (Sect. 2) if Va1,a2 € A: f(y(a1),y(a2)) Ec 
7(g(a1, a2)). 

We can check soundness with an SMT query as follows. Suppose we have 
SMT variables to denote a bitvector x € C and an abstract value a € A. We can 
use the concretization function y to represent the fact that x is included in the 
concretization of a. For example, for the u64 domain, we may use the formula 
memyga(x, a) £ (a.min Susa £) A (£ Susa a.mazx) to assert that x € (a). 

The input-output relationship of abstract operator g is available as a first- 
order logic formula extracted from the kernel source code (Sect. 5). We represent 
the resulting formula as a° = abs,(a{,a}), where aj and a} are input abstract 
values and a° is the output abstract value. 

The concrete semantics of the eBPF instruction set determines the input- 
output relationship of the concrete operation f. For example, the bpf_add64 
instruction performs binary addition (with possibility of overflow) of two 64- 
bit registers, denoted by +64. The action of this instruction is encoded through 
the formula x° = concy(xi,x}); for bpf_add64, concs (a, x$) & (xf +64 zb). 

The concrete ordering relationship Ec is just the subset operation C between 
two sets. For two sets S1, 52, we can encode the relationship S1 C S2 by asserting 
that Va: a € Sı > ax € So. Putting all this together, we can check the soundness 
of a single abstract operator abs,, by using an SMT solver to check the validity 
of the formula (i.e., by checking if the negation is unsatisfiable). 


Vai, z3 EC, aj,a, E A: mema(xj, a1) Amema (z3, a3) A 


x° = concp(z}, x$) Aa? = abs,(ai, ah) > mema(x°,a°) (1) 
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Generalizing Soundness To Abstraction/Reduction Operators Span- 
ning Multiple Abstract Domains. For the abstraction/reduction operators in 
Linux (Sect. 3), we can no longer assert soundness for an abstract domain purely 
using abstract values from that domain. We show how to extend the reasoning 
to two abstract domains. Let us denote the two abstract domains by A, and Ag. 
An eBPF instruction has two inputs (z{, x$) and each input has the correspond- 
ing abstract value for each abstract domain. Suppose ai, and ai, correspond to 
abstract values for the first input from domains A, and Ag, respectively (similarly, 
a, and a}, for the second input). Further, the concrete input zê must be in the 
intersection of the concretizations of all its abstract values. Hence, the formula 
mema, (zi aji) Amema, (21, aia) Amema, (£$, a1) A mema, (x5, a4) must hold. 

We denote the kernel’s abstraction/reduction operation, extracted from C 
source code, as {a?,a$} = abs (a1, a2, ab1, a2). Note that the kernel’s oper- 
ation outputs a list of abstract values corresponding to each abstract domain 
(unlike Eq. 1). The concrete semantics dictates that x° = concy (x4 , x4). 

To establish the soundness of the abstraction/reduction operator, we ensure 
that the concrete output is included in the concretizations of the abstract outputs 
in each domain, i.e., mema, (<°, a?) A mema, (x°, a3). Putting it all together, we 
check the validity of the following SMT formula: 


Vai, x2 EC, ai, a21 E Ai, Giz, Q22 E Ao: 

mema, (£1, a11) A mema, (£1, a12) A mema, (£3, a21) A MeMa, (£3, a22) ^ 
o i i o o i i i i 

x° = concs (x1, £2) A {a1, a2} = absg (aii, a12, 421, 422) 


= (mema, (2°, a?) A memas (2°,a8)) (2) 


The kernel uses five abstract domains (Sect. 3). Extending from two domains to 
all five domains is straightforward. It involves the addition of membership queries 
for the inputs and the corresponding abstract values (i.e., mem predicate above). 
The encoding of each of the kernel’s abstraction /reduction operators returns a 
list containing five abstract outputs (one for each domain). Finally, we check that 
the concrete output is included in the concretization of each abstract output. 


Encoding Arithmetic and Logic (ALU) Instructions. Using the formu- 
lation above, we have encoded soundness specifications of abstraction /reduction 
operators for 16 eBPF ALU instructions, which include 32 and 64-bit add, sub, 
div, or, and, lsh, rsh, neg, mod, xor, arsh. Notably, we exclude the multipli- 
cation instruction mul, whose SMT formula involves a bitvector multiplication 
operation and a large unrolled loop, making it intractable in the bitvector theory. 


Encoding Branch Instructions. We also encoded soundness specifications for 
conditional and unconditional branches (jeq, jlt, etc.) on both 64 and 32-bit 
register operands. These amount to 20 instructions, for a total of 36 instructions 
captured by our encodings. While the soundness of abstracting ALU instructions 
follows the general structure of Eq. 2, writing down the soundness conditions for 
branches is more involved. Branches do not concretely modify their input regis- 
ters. However, the kernel learns new information in the abstract domains using 
the branch outcome (true vs. false). For example, in the u64 domain, consider 
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two abstract registers [1, 5], [3,3]. Jumping upon an = (equals) comparison shows 
that the first register can also be set to [3,3] in the true case. Indeed, each con- 
ditional jump instruction produces four abstract outputs (rather than the usual 
one output for ALU instructions), corresponding to updated abstract values for 
two registers across two branch outcomes. 

We illustrate the encoding of the correctness condition for the jump instruc- 
tion for a single abstract domain. Given two concrete operands x} and x$, the 
concrete interpretation for the jump instruction returns whether the condition 
is true or false. When x° = concy(xi, xb), x° will be either true or false. The 
kernel’s abstraction/reduction operator generates four output abstract values, 
at A7 fo A21 45 ¢- T here are two abstract outputs corresponding to each input. 
They reflect the updated abstract value for the true case (e.g., a9, is the updated 
abstract value of the first input when the branch condition is true), and similarly 
for the false case. We represent the kernel’s abstraction/reduction operator for 
branch instructions by the formula {af,, a9 p, a34, 03s} = absg(a{, a). 

Our correctness condition for jumps requires that the inputs are present in 
the concretizations of the corresponding abstract value in both the true and false 
branch outcomes. The formula below specifies this correctness condition. 


Vai, 25 EC, aj, ay CA: mema(zi, a1) A mema(xd, a5) A 
x° = conc (z4, 2) A {a11 alf, 424,06} = abs,(ai,a5) > 
((x° > (mema (x4, a2,) A mema (z$, a34))) A (3) 


(nr? > (mema (zi, a2) A mema(a, a3,)))) 


The above correctness condition can be extended to multiple domains in a man- 
ner similar to Eq.2. The kernel’s implementation of the abstraction/reduction 
operator for a single jump instruction produces 20 output abstract values (2 
inputs x 2 branch outcomes x 5 domains). 


4.2 Refining Soundness Specification with Input Preconditioning 


When we checked the soundness of the kernel’s verifier using the soundness spec- 
ifications in Sect. 4.1, we observed that many of the abstract operators are not 
sound. However, it is unclear whether these violations are latent unsound behav- 
iors, or behaviors that could actually manifest with concrete eBPF programs. 
Specifically, the precondition in Eq. 2 is too general, including any combination 
of abstract values (across domains) as long as the intersection of their con- 
cretizations is non-empty. Indeed, the abstract operators in the Linux kernel are 
unsound if each instruction may start from any arbitrary abstract value across 
domains. However, these combinations of abstract values may never be encoun- 
tered in any eBPF program. Our goal is to refine the soundness specifications 
from Sect. 4.1 to minimize reporting latent (but unmanifested) bugs. 


Shared Suffix of Abstraction/Reduction Operator. Upon carefully 
analyzing the kernel’s abstraction/reduction operators, we observed that 
the kernel performs certain common computations—a shared suffix of 
abstraction/reduction operations—right before producing each abstract out- 
put (Fig.3(a)). As a concrete example, in kernel version 5.19, the function 
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reg_bounds_sync is called at the end of each ALU operation [49], updating the 
signed domains using the unsigned domains, the u64 bounds from u32 bounds 
and tnums, besides other reductions [48]. 


Our key insight is that this shared 
suffix of abstraction/reduction has 
the effect of preconditioning the ini- 
tial abstract values for any subse- 
quent instruction, narrowing down 
the set of possible abstract val- 
ues that a subsequent instruction 
may encounter as input. Further, all 
eBPF programs start executing from 
abstract values where each register in 
every domain is either T (any con- 
crete value in the domain) or its con- 
cretization is a singleton (precisely 
known concrete value). We observe 
and show using an SMT solver that 
the shared suffix computation does 
not modify initial values. 


Refined Soundness Specifica- 
tion by Preconditioning Input 
Abstract Values. We can lever- 
age shared suffix operations to refine 
our soundness specification as fol- 
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Fig. 3. (a) The structure of each abstrac- 
tion/reduction operator in the kernel can 
be conceptualized as having a prefix that 
depends on the specific operator, generat- 
ing an intermediate output, and a suffix that 
is shared across all the operators, result- 
ing in the final abstract output. (b) We use 
a refined soundness specification that pre- 
conditions input abstract values a using the 
shared suffix sro(.) of the reduction opera- 
tors used in the Linux kernel. 


lows. First, let sro(a) denote the abstract outputs of computing the shared suffix 
of the abstraction/reduction over the abstract inputs a € A; x Ag--- x As. The 
SMT formula encoding sro(a) is extracted using our C to SMT encoder (Sect. 5). 
The main change from the specifications in Sect. 4.1 is that the shared suffix pre- 
conditions the input values to any abstract operator. Hence, for example, the 
soundness specification for two abstract domains from Eq. 2 is updated to use 
an input abstract value sro(a) as shown below: 


Vai, 2 EC, ayy, a2, E Ai, Giz, Q22 E Ao: 
(bii, b12) = sro(aii, ai2) A (b21, b022) = sro(ad1, a22) A 


mema; (xi, b41) A mema, (x4, big) A mema; (zb, b21) A memas (x$, bba) A 


x° z= concy (xi, £3) A {a?, a5} = abs g( ney fos Days as) 


(4) 


= (mema, (2°, a2) A memas (2°, a3)) 


It is straightforward to generalize to multiple domains. Refinement eliminated 
most of the latent violations reported from Sect. 4.1. We found that the latest 
kernel versions are sound with respect to value tracking. 
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4.3 Automatically Producing Programs Exercising Soundness Bugs 


Even after refining the soundness specifications (Sect. 4.2), we still find a few 
violations of soundness. It is challenging to determine whether these violations 
are “real” (manifested in actual eBPF programs) or latent, since input abstract 
values preconditioned by sro still overapproximate the abstract values that may 
occur when analyzing actual eBPF programs (Fig. 3(b), Sect. 4.2). 

We aim to automatically generate eBPF programs that manifest soundness 
bugs (uncovered by the techniques in Sect. 4.2) in an actual kernel verifier exe- 
cution. Our problem is a form of differential synthesis: generating programs 
whose semantics diverge between the concrete execution and the abstract anal- 
ysis. We propose a sound but incomplete approach to generate eBPF programs 
that demonstrate soundness violations. We enumerate loop-free programs up to a 
bounded length, using an SMT solver to identify concrete and abstract operands 
that manifest soundness violations. 

Our approach is a combination of well-known existing techniques from enu- 
merative [20,52,63] and deductive program synthesis [19,41,58,67]. However, 
unlike typical program synthesis problems which have a V4 formula structure 
(e.g. meet a specification on all inputs), our problem has a much more tractable 
J structure, i.e. finding one concrete input and program to trigger a soundness 
violation. In this sense, it is more akin to property-directed reachability algo- 
rithms used in model checking [22,27]. 


Preliminaries. The eBPF run-time starts executing eBPF programs with all 
live registers holding values that are either precisely known at compile time (e.g. 
offsets into valid memory regions) or completely unknown (e.g. contents of packet 
memory). For an abstract value a € A, x A2--- x As, we say that init(a) holds 
if a is either singleton (e.g. Vx € Zġ : [z,2] in u64) or T in each domain Aj. 
We refer to such abstract values as initial abstract values. It is straightforward 
to write down an SMT formula for init(a) for the kernel’s domains. We say 
an abstract value b € A, x Ag---: x As is reachable if there exists a sequence 
of eBPF instructions for which the abstract analysis can produce b for some 
register starting from input registers whose abstract values all satisfy init(-). 


Overview. Given an abstract operator that violates the soundness specification 
in Sect. 4.2, our algorithm finds an eBPF instruction sequence that shows that 
the violating input abstract values are reachable. For a bounded program length 
k, we enumerate all sequences of eBPF concrete operators (i.e. arithmetic, logic, 
and branching instructions) of length k — 1, with the k*” instruction being the 
violating concrete operator. This enumeration produces the “skeleton” of the pro- 
gram, filling out the opcodes, but leaving the operands as well as the data and 
control flow undetermined. For each skeleton, we discharge an SMT query that 
identifies the concrete and abstract operands for k instructions with well-formed 
data and control flow. The first instruction consumes eBPF initial abstract val- 
ues. Starting from k = 1, if we cannot find an eBPF program of length k that 
manifests the violation, we increment k and try again until a timeout. 


Single Instruction Programs (k = 1). As the base case, we check whether 
initial abstract values along with suitable concrete values may already violate 
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soundness (Sect. 4.2). For example, suppose our enumeration generated the 1- 
instruction program v = bpf_or(t,u). For simplicity, below we work with just 
one abstract domain. Building on Eq. (1), we discharge the SMT formula: 


tu EC, aja, EÀ: 
init(a+) A init(au) A mema (t, at) A mema (u, au) A 


V = CONCor (t, U) A dy = abSor (at, du) A a(mema(v, av)) (5) 


If the formula is satisfiable, the model provides the concrete operands t, u, with 
the result that bpf_or(t,u) is an executable eBPF program manifesting the 
soundness violation. However, an unsound operator may fail to produce a model 
since the necessary abstract operands lie outside the initial abstract values. 


Straight-line Programs, Length k > 1. Larger the length of the program k, 
larger the set of reachable input abstract values available to manifest a soundness 
violation at the kt” instruction. We exhaustively enumerate all possible (k — 
1)-long instruction sequences. To enable well-formed data flow between the k 
instructions, the inputs for each instruction are sourced either from the outputs 
of prior instructions or initial abstract values. 

For example, consider a two-instruction program (k = 2) generated by the 
enumerator: r = bpf_and(p,q); v = bpf_or(t,u), We are looking for sound- 
ness violation in bpf_or. The variables p,q,r,t,u,v are concrete values, with 
corresponding abstract values ap, Qq,':: ,@y. The abstract inputs of the first 
instruction bpf_and are initial abstract values. The abstract inputs of the last 
instruction may be drawn from either ap, aq,ar or the initial abstract values. 
We use the formula assign(x, {y1, Y2,- }) to denote that x is mapped to one of 
the variables y1, y2,--- in both the concrete and abstract domains. We can write 
down assign(x, {y1, Y2,- }) Ê (£ = y1 A ür = ay, ) V (£ = yo A ar = ay) V+ 
We discharge the following SMT formula to a solver: 


p,q, r,t,u,v E C, Gp, Qq, ar, at, Qu, av EÀ: 

init(ap) A init(ag) A mema (p, ap) A mema (q, aq) A 

T = cONnCana(P, q) ^ ar = abSana(ap, ag) A Mema (r, ar) A 

(init(az) V assign(t, {p,q,r})) A (init(au) V assign(u, {p,q,r})) A 
mema (t, at) A mema (u, au) A 


V = CONCor (t, U) A dy = absSor (at, du) A a(mema(v, av)) (6) 


A model for the formula produces the concrete and abstract operands for the two 
instructions, leading to an executable bug-manifesting program. This approach 
is extensible to more instructions and more abstract domains. 


Loop-free Programs. Incorporating branch instructions significantly broadens 
the set of input abstract values available to the kt” instruction, improving the 
likelihood of finding a bug-manifesting program at a given length. We turn each 
branch into a single-instruction ite whose outputs are available for subsequent 
instructions. More concretely, (i) any of the 1--- k — 1 instructions may be jump 
instructions; (ii) the jump target of a branch instruction in the i*” slot for both 
outcomes (i.e. true or false) points to the i +1" slot, and (iii) the abstract 


Verifying the Verifier 239 


outputs of the branch (e.g. from Eq. (3)) may be used as abstract inputs for 
subsequent instructions, similar to arithmetic and logic instructions. 

As an example, suppose our enumerator produces r = bpf_jump_gt64(p,q,@); 
v = bpf_or(t,u). Here r is a concrete value which is either true or false. We use 
0 as the jump target, always pointing branches to the next instruction. There are 
four abstract outputs from the jump: apt, aqt for the true branch and apr, aqs for 
the false branch (see Sect. 4.1). For convenience, we set the abstract value a; (resp. 
ag) to either apt or apf (resp. agt Or agf) based on the branch outcome; and also 
assert that the corresponding final concrete values p° = p and q° = q. Building on 
Eq. (3), we ask the SMT solver for a model of the formula: 


p,g,t,u,v EC, re {true, false}, ap, aq, at, Qu,@v € A: 

init(ap) A init(ag) A mema(p, ap) A mema (q, dq) A 

r= CONCjump_gt64(P, q) A (Apt, apf, aqt, agf} = abSjump_gt64(p, aq) ^ 
(r = (mema (p, apt) A Mema (q, aqt) A ap = apt A ag = aqt) ) A 

(ar = (mema (p, apf) A mema (q, aqf) ^ ap = apf ^ aq = aqf)) A 
(init(at) V assign(t, {p°,q4°})) A (init(au) V assign(u, {p°, q°})) A 
mema (t, at) A mema (u, du) A 


V = CONCor(t, U) A au = abSor (at, du) A a(mema(v, av)) (7) 


Validation of Manifested Soundness Violations. The programs generated 
by our approach for bugs with known CVEs were similar to the proof-of-concept 
implementations found in these CVEs. For previously unknown bugs, we logged 
the kernel verifier’s state as it analyzes eBPF programs and also executed the 
eBPF program with the concrete operands produced by the SMT solver. We 
compared the parameters in the SMT solver’s model and those from the kernel 
verifier and run-time result. This process entailed manually compiling and boot- 
ing into each kernel version that we check, and running the generated programs. 
For the manifested bugs, we found exact agreement between the SMT model 
and the observed behaviors in all cases we checked. 


5 C to Logic for Kernel’s Abstract Operators 


To prove the soundness of the kernel’s abstract operators, we first have to extract 
the input-output semantics of the operators from the kernel’s implementation in 
C into first-order logic. It is tedious and error-prone to manually write down the 
formulas for each version of the kernel. Further, the verifier’s abstract semantics 
can change across versions. Hence, we automatically generate the first-order 
logic formula (in SMT-LIB format) directly from the verifier’s C source code. 
Modeling C code in general is hard [42,46,64]. However, we observe that it is 
sufficient to handle a subset of C for the verifier’s value-tracking routines. 


Verifier’s C Code for Value-tracking. The kernel uses two integers to rep- 
resent abstract values for each of the five domains (Sect. 3). These 10 integers 
are encapsulated in a structure named bpf_reg_state (reg_st for short). The tnum 
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domain is further encapsulated within reg_st in a struct called tnum. This static 
“register state” is maintained for each register in the eBPF program being ana- 
lyzed. The kernel has a single top-level function called adjust_scalar_min_max_vals 
(adjust_scalar for short) that is called for each abstract operator corresponding to 
ALU instructions [16]. This function takes three arguments: opcode and two reg- 
ister states named dst and src that track the abstract value in the destination and 
source register of the eBPF instruction, respectively. Depending on the opcode, 
one of several switch-cases is executed, which leads to instruction-specific func- 
tion calls that modify the abstract values in dst and src. None of the functions 
updating register state in the call-chain have recursion or loops. The kernel has 
a structured way of accessing the members of reg_st. We use these specific fea- 
tures to translate C code to logic. The structures of the corresponding functions 
for jumps (reg_set_min_max and descendants) are similar. 


Preprocessing the Verifier’s C Code. We use the LLVM compiler’s [47] 
intermediate representation (IR) because it allows us to handle complex C code 
and provides a collection of tools to modify, optimize, and analyze the IR. 
Figure 4(a) shows an overview of our tool’s pipeline. Consider the case where we 
want to generate the SMT-LIB file for the abstract operator corresponding to the 
32-bit bitwise OR instruction (bpf_or32). After obtaining the verifier’s code in IR 
(stage T)), we proceed to apply our custom IR-transforming passes (stage @)). 
First, we remove functions that are not relevant to our purpose because they do 
not modify register state. Next, we inline all the function calls that adjust_scalar 
makes. Inlining is possible because there are no recursive functions or loops in 
the call-graph. Next, we need to create a slice of the verifier that is only con- 
cerned with bpf_or32. We inject an LLVM instruction in the entry basic block of 
adjust_scalar which sets the opcode to bpf_or32. LLVM’s optimizer removes all 
irrelevant code from this IR with constant propagation and dead-code elimina- 
tion. Next, we adapt a transformation pass from Seahorn’s [42] codebase, which 
allows us to lower memcpy instructions to a sequence of stores. The result is a 
single function in LLVM IR, which captures the action of the abstract operator 
given input abstract states (i.e., dst and src) for one instruction (bpf_or32). 


The LLVMToSMT Pass. In step (3), we use the theory of bitvectors to 
generate the first-order logic formula for the function obtained from step (@). 
Since we encode everything with bitvectors, we need a memory model to capture 
memory accesses. We model memory as a set of two disjoint regions pointed to 
by dst and src. Given that the memory is only accessed via the structure reg_st’s 
fields, we can further view memory as a set of named registers. This allows us to 
model the entire memory as a tree of bitvectors: the leaf nodes store bitvectors 
corresponding to the first-class members of reg_st (e.g. for u64_min), the non-leaf 
nodes store trees of aggregate types (e.g. for tnum). C struct member accesses 
in IR begin with a getelementptr (GEP) instruction, which calculates the pointer 
(address) of the struct’s member. We use an indexing similar to that used by 
GEP to to identify the bitvector that corresponds to the accessed member. 


Handling Straight Line Code and Branches. LLVM’s IR is already in SSA 
form. Every IR instruction that produces a value defines a new temporary virtual 


Verifying the Verifier 241 


define void @adjust_scalar_bpf_or32(reg_st* %dst, reg_st* %src) { 


Ie. 
- verifier.c 2. entry: 
J 3: ; liveOnEntry 
. 4. %x® = getelementptr reg_st, reg_st* %src, i64 0, i32 4, i32 @ 
Compile 5. ; MemoryUse(liveOnEntry) 
with clang © 6 %x1 = load i64, i64* %xo 
ae Ts %x2 = icmp eq i64 %x1, @ 
Seal 8. br il %x2, label %ltrue, label %lend 
_ verifier.I | 
Custom 2 9. Itrue: 
transformation @ : 10. %x4 = getelementptr reg_st, reg_st* %dst, i64 @, i32 5 
passes : 11. ; 1 = MemoryDef (liveOnEntry) 
J 12. store i64 0, i64* %x4 
| 5 TN 13. %x5 = getelementptr reg_st, reg_st* %dst, i64 @, i32 6 
bpf_or32.ll | 14. ; 2 = MemoryDef(1) 
poo 15. store i64 4294967295, i64* %x5 
LLVMToSMT © oe 16. br label %lend 
pass * 
5 17. lend: 
bpf_or32.smt2 KS 18. ; 3 = MemoryPhi({entry, liveOnEntry},{%ltrue, 2}) 
= : + (Se ret void 
“20. 3 


(a) i (b) 


Fig. 4. (a) The pipeline for automatically generating an SMT-LIB file from the Linux 
kernel’s verifier.c. Shown here is an instance of the pipeline for the bpf_or32 instruc- 
tion. (b) The LLVM IR presented as a CFG, overlaid with MemorySSA analysis in 
red, for a function adjust_scalar_bpf_or32 that is representative of verifier code for 
bpf_or32. It takes as input two structs dst and src and modifies them. 


register. We create a fresh bitvector variable when we encounter a temporary in 
the IR. Consider a simple addition instruction: %y = add i64 %x, 3. To encode 
the instruction, we create a formula that asserts an equality between a fresh 
bitvector BV, and the existing one BV,, based on the semantics of the instruction: 
BVy == BV% + BVconst3- 

To handle branches, we precondition the SMT formula for each basic block 
with its path condition. As the IR we analyze does not contain loops, the control 
flow graph (CFG) is a directed acyclic graph. Hence, the path condition of each 
basic block is a disjunction of path conditions flowing through each incoming 
edge into the node corresponding to that block in the CFG. Phi nodes (¢’s) in 
SSA merge the values flowing in from various paths. We use the phi instructions 
in IR to merge incoming values. We calculate an “edge condition” formula for each 
incoming edge to the phi. Then, we encode the phi instruction by appropriately 
setting the bitvector to the incoming values based on the edge condition. 


Handling Memory Access Instructions. Our tool leverages LLVM’s Mem- 
orySSA analysis [17] to handle loads and stores. The MemorySSA pass creates 
new versions of memory upon stores and branch merges, associates load instruc- 
tions with specific versions, and provides a memory dependence graph between 
the memory versions. Figure4 (b) shows an example CFG in IR overlaid with 
MemorySSA analysis (red). We maintain a one-to-one mapping between the dif- 
ferent versions of memory presented by MemorySSA, and versions of our memory 
model consisting of bitvector-trees. liveOnEntry (line 3) is the memory version 
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at the start of the function. The bitvectors in the corresponding bitvector-tree 
are the input operands for the kernel’s abstract operators. 

Every load instruction is annotated with a MemoryUse (e.g.. the load instruc- 
tion on line 6 reads from the liveOnEntry memory version), and preceded by a 
GEP. Thus, we choose the appropriate bitvector-tree and index into it to obtain 
the appropriate bitvector (say BVsrco). We encode the load instruction as: (BV,1 == 
BVsrco). A store instruction (e.g. line 12, annotated using a MemoryDef) modifies an 
existing memory version (liveOnEntry) to create new version (1). We create a new 
bitvector-tree and map it to version 1. The bitvectors in this bitvector-tree are 
exactly the same as liveOnEntry’s, except for the bitvector in the location that 
the store modifies. The latter bitvector is replaced with the bitvector mapped 
to the temporary used for the store. For a MemoryPhi node (e.g. line 18, creating 
version 3), we create a new bitvector-tree for the latest memory version (e.g. 3). 
Similar to regular phi nodes, we use the edge condition of the incoming edges to 
conditionally set each bitvector in the new bitvector-tree to the corresponding 
bitvector in the memory version propagated through that edge. 

The bitvector-tree corresponding to the active memory version at the point 
of the (unique) ret instruction (e.g. 3 in the lend block) contains the output 
operands for the kernel’s abstract operators. 


6 Experimental Evaluation 


Our prototype, Agni [18,72], automatically checks the soundness of the value 
tracking algorithms in various versions of the kernel eBPF verifier. It uses LLVM 
12 [47] for the C to logic translation and the Z3 SMT solver [36] for checking 
formulas. The source code for our prototype is publicly available [18,72]. We 
evaluate Agni to determine the effectiveness in checking soundness of the kernel 
verifier and the ability to generate eBPF programs that manifest soundness 
violations (which we call proof-of-concepts, or POCs). 


Checking Soundness Across Kernel Versions. We have automatically 
checked the soundness of all combinations of abstract operators and abstract 
domains for kernels between versions 4.14 and 5.19. Figure 5(a) provides a sum- 
mary of our results. To keep the size of the table short, we only report kernel 
versions starting from 4.14 that are known to have a documented CVE or a bug 
that is distinct from one in a prior kernel version (4.14, 5.5, 5.7-rcl, 5.8, ...). 
We evaluated intermediate kernel versions that are not reported; our tool can 
support all kernel versions between 4.14 to 5.19 (the latest as of this writing). 

We compare our generic soundness specification (Sect. 4.1, labeled gen in 
columns 2,4,6) and the refined one (Sect. 4.2, labeled sro in columns 3,5,7). A 
kernel with at least one potentially unsound domain or operator is considered 
unsound (columns 2 and 3). Operator+domain pairs that violated the soundness 
specification are reported in columns 4 and 5. Those operators that violated 
soundness in at least one domain are reported in columns 6 and 7. 

All kernel versions including the latest ones are unsound with respect to 
the generic soundness specification (column 2). Even in one of the latest ver- 


Verifying the Verifier 243 


Nümiof Num. of Namor Num. of Kernel Num. of | All Program 
Sound? Violations Unsound kena Sound? Violations Unsound Version |. Total POCs Length 
Operators Operators Violations] Synth? 2 3 
Version 

gen| sro | gen | sro | gen| sro gen| sro | gen | sro | gen | sro oe 21 i aia 
5.5 30) X oj 20 2 
x| x 23 21 9 7) 45.11 x x 71| 62| 16) 16 5.7-rc1 99| v 55| 44 ie) 
x | xX 32 30} 12) 10) 15.12 x x 71| 62| 16) 16 5.7 67| v 39| 28 ie) 
x |x 101 99} 31) 31} {5.13 x v 9 0 6 0 5.8 67| v 39| 28 ie) 
x1 x 69 67| 15} 15} 15.14 x Yv 9 0 6 0 5.9 65| v 39| 26 o 
x | x 69 67| 15) 15) 15.16 x v 9 0 6 0 5.10 65 v 19| 44 2 
x| x 67 65| 15) 15} {5.17 x Vv 9 0 6 0 5.10-rc1 n| v 39| 32 o 
x | xX 74 65| 17) 17) {5.18 x v 9 0 6 0 5.11 62| v 16| 44 2 
x | xX 74 71| 17) 17) {5.19 x Y 9 0 6 0 5.12 62| v 16| 44 2 


(a) 


T 
z 


Fig. 5. (a) Soundness violations detected with the generic soundness specification 
(Sect. 4.1, labeled gen) in comparison to the refined specification (Sect. 4.2, labeled 
sro). We show the number of violating operator+domain pairs (columns 4-5) and 
number of unsound operators (columns 6-7) (b) Number of generated POCs and their 
lengths for unsound operator+domains after sro checks. 


sions of the kernel (v5.19), 6 operators corresponding to bpf_xor64, bpf_xor32, 
bpf_and64, bpf_or64, bpf_or32, and bpf_and32 are unsound according to the 
generic soundness specification (column 6, row of kernel version 5.19). Refining 
the soundness specification enables us to prove the soundness of all operators in 
kernels newer than 5.13 (column 3). However, even the latter reports violations 
for older kernels. Among those violations, 27 were previously unknown. A single 
wrong abstract operator can violate the soundness of many abstract domains (up 
to 5). The refined (sro) specification reduces the reported soundness violations 
by ~ 6.8% in potentially unsound kernel versions and by 100% in sound ones. 

We observed that the 64-bit jump instructions and 64-bit /32-bit bitwise 
instructions exhibited the largest number of soundness violations. The unsound- 
ness persisted across multiple kernel versions (until eventually patched). 


Generating POCs for Unsound kernels. We evaluate the ability of differ- 
ential synthesis (Sect. 4.3) to generate eBPF programs that manifest soundness 
bugs. Figure 5(b) summarizes our results. Starting with operator+domain pairs 
from soundness violations uncovered by sro (column 2), we report whether all 
operator+domain violations were successfully manifested using POCs (column 
3) and the lengths of the POCs successfully generated (columns 4,5,6). We pro- 
duced a POC for ~ 97% of soundness violations across kernel versions (validated 
as described in Sect. 4.3). The smallest POCs for many violations require multi- 
instruction programs. For example, none of the soundness violations in version 
5.5 may be manifested with a single eBPF instruction. We generated a POC for 
all soundness violations for all but 2 versions of the kernel (for versions 4.14 and 
5.5, we generated a POC for all but 3 and 8 violations respectively). The ability 
to manifest almost all of the reported sro violations speaks to the significance 
and precision of the refinement in the soundness specification. Our differential 
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synthesis technique may enable developers to experiment with concrete eBPF 
programs to validate and debug unsound behaviors in the kernel verifier. 

Some bugs in the eBPF verifier are well known security vulnerabilities and 
have known POCs [51,62]. We have generated a POC, of equal or lesser size, for 
all known CVEs in the kernel versions analyzed. For example, we have generated 
a POC for a well known bug with two instructions instead of four [62]. 


Time Taken to Verify kernels and Generate POCs. We conducted our 
experiments on the Cloudlab [37] testbed, using a machine with two 10-core Intel 
Skylake CPUs running at 2.20GHz with 192GB of memory. When using the 
generic soundness specifications, 90% of the abstract operators (eBPF instruc- 
tions) were checked for soundness within ~ 100 minutes. If deemed unsound, 
the refined specification was checked in ~ 30 minutes for ~ 90% of the unsound 
operators. On the extreme, verifying some operators, as well as finding a POC 
for some soundness violations, may take a long time (2000 min or more). We 
attribute this to the significant size of the SMT-LIB formulas that are gener- 
ated. We were able to find POCs for 90% of the soundness violations in kernel 
versions 5.7-rcl through 5.12 within a few hours. 


7 Limitations and Caveats 


The results in this paper must be interpreted with the following caveats. 


Only Range Analysis is Considered. There are other static analyses in the 
kernel verifier beyond range analysis (Sect. 1). These include tracking register live- 
ness for reading and writing, and detecting speculative execution vulnerabilities. 


Coverage of eBPF Abstract Operators. We exclude verifying the soundness 
of the abstract operators corresponding to multiplication as they cause our SMT 
verifications to time out. This is primarily due to the presence of 64-bit bitvector 
multiplication in the SMT encoding of these operators. We have verified their 
soundness using 8-bit bitvectors. Our results on (un)soundness cover all other 
abstract arithmetic, logic, and branching operators (Sect. 4.1). 


Trusted Computing Base. Our C to SMT translation (Sect. 5) and soundness 
proofs have software dependencies including the LLVM compiler infrastructure, 
the Z3 solver, and our translation passes, which together form our trusted com- 
puting base. We have unit tested our C-to-SMT translations extensively. We 
validated our synthesized POCs by manually executing them in Linux kernels 
running inside the QEMU emulator, replicating the soundness bugs. Despite our 
best efforts, it is possible that there are bugs in our software infrastructure. 


Incompleteness of Differential Synthesis. The differential synthesis app- 
roach is incomplete (Sect. 4.3). If our refined verification condition (Eq. (4)) finds 
an operator unsound, and the synthesis is unable to produce a POC, there are 
two possibilities. First, there may be long programs which could manifest the 
unsound behavior. Our enumerative algorithm currently times out for programs 
of length > 4. Second, it is possible that the bug cannot be manifested with 
any concrete eBPF program, and is reported due to overapproximation in the 
soundness specification. 
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8 Related Work 


Closest Related Work. The two closely related prior works are: (1) a paper 
on tnum verification [71], and (2) a recent manuscript on verifying range analy- 
sis [21]. The tnum paper explores formal verification for a single abstract domain: 
tnums. The recent manuscript [21] also aims to prove the soundness of the eBPF 
verifier’s value-tracking. In contrast, our work differs by (1) exposing the non- 
modular nature of the abstract operators in the kernel, and (2) proposing a 
method to reason about abstract operators for both arithmetic and branches, 
(3) automatically generating VCs from kernel source code, and (4) synthesizing 
eBPF programs that exercise the divergence of abstract and concrete semantics. 


Safety of eBPF Programs And Static Analyzers. eBPF compilation and 
interpreter safety has been a site of recent endeavors [59,60,69, 73,74]. PRE- 
VAIL [39] uses abstract interpretation using the zone abstract domain for check- 
ing safety outside the kernel. In contrast, we focus on proving the soundness of 
the in-kernel verifier. 


Abstract Interpretation And Domain Refinement. Prior work on abstract 
interpretation [30,31,33] and value-tracking abstract domains [55,56,68] have 
indirectly influenced the eBPF verifier’s design [61,71]. The idea of combin- 
ing abstract domains to enhance the precision of abstract representations was 
first introduced by Cousot with the reduced product and disjunctive completion 
domain refinements [29,34] and further improved by others [70]. A systematic 
survey on product abstract operators is also available [28]. Specifically, we tailor 
our work to verify the abstract operators in the Linux kernel. 


C to First-order Logic. Similar to our approach that generates first-order- 
logic formulas from C code, prior tools also generate verification conditions from 
C code [42,46, 54,64]. A few of them, SMACK [64] and SeaHorn [42], use LLVM 
IR for this purpose. These tools support a rich subset of C. They typically model 
memory as a linear array of bytes, which is not ideal for modeling kernel source 
code. We explore a subset of C that is sufficient to handle kernel code and still 
generates queries using only the bitvector theory, which enables us to efficiently 
verify soundness for multiple versions of the kernel. 


9 Conclusion 


We present a fully automated method to verify the soundness of range analy- 
sis in the Linux kernel’s eBPF verifier. We are able to check the soundness of 
multiple kernel versions automatically because we generate the verification con- 
ditions for the abstract operators directly from the kernel C code. We develop 
specifications for reasoning about soundness when multiple abstract domains 
are combined in a non-modular fashion in the kernel. Our refinement to this 
specification, capturing preconditioning in the kernel, proves the soundness of 
recent Linux kernels. We also successfully generate concrete eBPF programs 
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that demonstrate the divergence between abstract and concrete semantics when 
soundness checks fail. Our next step is to push for incorporating this approach 
in the kernel development process, to help eliminate verifier bugs during code 
review. 
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Abstract. Masking is a widely-used effective countermeasure against 
power side-channel attacks for implementing cryptographic algorithms. 
Surprisingly, few formal verification techniques have addressed a fun- 
damental question, i.e., whether the masked program and the original 
(unmasked) cryptographic algorithm are functional equivalent. In this 
paper, we study this problem for masked arithmetic programs over Galois 
fields of characteristic 2. We propose an automated approach based on 
term rewriting, aided by random testing and SMT solving. The overall 
approach is sound, and complete under certain conditions which do meet 
in practice. We implement the approach as a new tool FISCHER and carry 
out extensive experiments on various benchmarks. The results confirm 
the effectiveness, efficiency and scalability of our approach. Almost all 
the benchmarks can be proved for the first time by the term rewriting 
system solely. In particular, FISCHER detects a new flaw in a masked 
implementation published in EUROCRYPT 2017. 


1 Introduction 


Power side-channel attacks [42] can infer secrecy by statistically analyzing the 
power consumption during the execution of cryptographic programs. The vic- 
tims include implementations of almost all major cryptographic algorithms, e.g., 
DES [41], AES [54], RSA [33], Elliptic curve cryptography [46,52] and post- 
quantum cryptography [56,59]. To mitigate the threat, cryptographic algorithms 
are often implemented via masking [37], which divides each secret value into 
(d+ 1) shares by randomization, where d is a given masking order. However, it 
is error-prone to implement secure and correct masked implementations for non- 
linear functions (e.g., finite-field multiplication, module addition and S-Box), 
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which are prevalent in cryptography. Indeed, published implementations of AES 
S-Box that have been proved secure via paper-and-pencil [19,40,58] were later 
shown to be vulnerable to power side-channels when d is no less than 4 [24]. 

While numerous formal verification techniques have been proposed to prove 

resistance of masked cryptographic programs against power side-channel attacks 
(e.g., [7, 13, 26, 29-32, 64]), one fundamental question which is largely left open is 
the (functional) correctness of the masked cryptographic programs, i.e., whether 
a masked program and the original (unmasked) cryptographic algorithm are 
actually functional equivalent. It is conceivable to apply general-purpose pro- 
gram verifiers to masked cryptographic programs. Constraint-solving based 
approaches are available, for instance, Boogie [6] generates constraints via weak- 
est precondition reasoning which then invokes SMT solvers; SeaHorn [36] and 
CPAChecker [12] adopt model checking by utilizing SMT or CHC solvers. More 
recent work (e.g., CryptoLine [28,45,53,62]) resorts to computer algebra, e.g., 
to reduce the problem to the ideal membership problem. The main challenge of 
applying these techniques to masked cryptographic programs lies in the pres- 
ence of finite-field multiplication, affine transformations and bitwise exclusive- 
OR (XOR). For instance, finite-field multiplication is not natively supported by 
the current SMT or CHC solvers, and the increasing number of bitwise XOR 
operations causes the infamous state-explosion problem. Moreover, to the best of 
our knowledge, current computer algebra systems do not provide the full support 
required by verification of masked cryptographic programs. 
Contributions. We propose a novel, term rewriting based approach to effi- 
ciently check whether a masked program and the original (unmasked) crypto- 
graphic algorithm (over Galois fields of characteristic 2) are functional equiva- 
lent. Namely, we provide a term rewriting system (TRS) which can handle affine 
transformations, bitwise XOR, and finite-field multiplication. The verification 
problem is reduced to checking whether a term can be rewritten to normal form 
0. This approach is sound, i.e., once we obtain 0, we can claim functional equiv- 
alence. In case the TRS reduces to a normal form which is different from 0, 
most likely they are not functional equivalent, but a false positive is possible. 
We further resort to random testing and SMT solving by directly analyzing the 
obtained normal form. As a result, it turns out that the overall approach is 
complete if no uninterpreted functions are involved in the normal form. 

We implement our approach as a new tool FISCHER (Functionallty of 
maSked CryptograpHic program verifiER), based on the LLVM framework [43]. 
We conduct extensive experiments on various masked cryptographic program 
benchmarks. The results show that our term rewriting system solely is able 
to prove almost all the benchmarks. FISCHER is also considerably more effi- 
cient than the general-purpose verifiers SMACK [55], SeaHorn, CPAChecker, and 
Symbiotic [22], cryptography-specific verifier CryptoLine, as well as a straight- 
forward approach that directly reduces the verification task to SMT solving. For 
instance, our approach is able to handle masked implementations of finite-field 
multiplication with masking orders up to 100 in less than 153s, while none of 
the compared approaches can handle masking order of 3 in 20 min. 
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In particular, for the first time we detect a flaw in a masked implementation of 
finite-field multiplication published in EUROCRYPT 2017 [8]. The flaw is tricky, 
as it only occurs for the masking order d = 1 mod 4.’ This finding highlights 
the importance of the correctness verification of masked programs, which has 
been largely overlooked, but of which our work provides an effective solution. 

Our main contributions can be summarized as follows. 


— We propose a term rewriting system for automatically proving the functional 
correctness of masked cryptographic programs; 

— We implement a tool FISCHER by synergistically integrating the term rewrit- 
ing based approach, random testing and SMT solving; 

— We conduct extensive experiments, confirming the effectiveness, efficiency, 
scalability and applicability of our approach. 


Related Work. Program verification has been extensively studied for decades. 
Here we mainly focus on their application in cryptographic programs, for which 
some general-purpose program verifiers have been adopted. Early work [3] uses 
Boogie [6]. HACL* [65] uses F* [2] which verifies programs by a combination of 
SMT solving and interactive proof assistants. Vale [15] uses F* and Dafny [44] 
where Dafny harnesses Boogie for verification. Cryptol [61] checks equivalence 
between machine-readable cryptographic specifications and real-world imple- 
mentations via SMT solving. As mentioned before, computer algebra systems 
(CAS) have also been used for verifying cryptographic programs and arithmetic 
circuits, by reducing to the ideal membership problem together with SAT /SMT 
solving. Typical work includes CryptoLine and AMulet [38,39]. However, as 
shown in Sect. 7.2, neither general-purpose verifiers (SMACK with Boogie and 
Corral, SeaHorn, CPAChecker and Symbiotic) nor the CAS-based verifier Cryp- 
toLine is sufficiently powerful to verify masked cryptographic programs. Interac- 
tive proof assistants (possibly coupled with SMT solvers) have also been used to 
verify unmasked cryptographic programs (e.g., [1,4,9,23,27,48,49]). Compared 
to them, our approach is highly automatic, which is more acceptable and easier 
to use for general software developers. 


Outline. Section 2 recaps preliminaries. Section 3 presents a language on which 
the cryptographic program is formalized. Section4 gives an example and an 
overview of our approach. Section5 and Sect.6 introduce the term rewriting 
system and verification algorithms. Section 7 reports experimental results. We 
conclude in Sect. 8. The source code of our tool and benchmarks are available at 
https: //github.com/S3L-official/FISCHER. 


2 Preliminaries 


For two integers l, u with | < u, |l, u] denotes the set of integers {1,/+1,--- ,u}. 
Galois Field. A Galois field GF(p") comprises polynomials a,_,X"~! +--+ 
aıX! + ao over Zp = [0,p — 1], where p is a prime number, n is a posi- 


tive integer, and a; E€ Zp. (Here p is the characteristic of the field, and p” 


1 This flaw has been confirmed by an author of [8]. 
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is the order of the field.) Symmetric cryptography (e.g., DES [50], AES [25], 
SKINNY [10], PRESENT [14]) and bitsliced implementations of asymmetric 
cryptography (e.g., [17]) intensively uses GF(2”). Throughout the paper, F 
denotes the Galois field GF(2”) for a fixed n, and @ and ® denote the addition 
and multiplication on F, respectively. Recall that GF(2") can be constructed 
from the quotient ring of the polynomial ring GF(2)[X] with respect to the ideal 
generated by an irreducible polynomial P of degree n. Hence, multiplication is 
the product of two polynomials modulo P in GF(2)[X] and addition is bitwise 
exclusive-OR (XOR) over the binary representation of polynomials. For exam- 
ple, AES uses GF(256) = GF(2)[X]/(X® + X4 + X? + X +1). Here n = 8 and 
P=X84+X*4+X394X41. 

Higher-Order Masking. To achieve order-d security against power side- 
channel attacks under certain leakage models, masking is usually used [37,60]. 
Essentially, masking partitions each secret value into (usually d+ 1) shares so 
that knowing at most d shares cannot infer any information of the secret value, 
called order-d masking. In Boolean masking, a value a € F is divided into shares 
do,@1,--.-,@¢q E F such that ao9a19. ..Daq = a. Typically, a,,...,aq are random 
values and ao = a@a1@...@aq. The tuple (ao, a1,..., aa), denoted by a, is called 
an encoding of a. We write Die(o,a) a; (or simply @ a) for ap Gai @...@ag. Addi- 
tive masking can be defined similarly to Boolean masking, where @ is replaced 
by the module arithmetic addition operator. In this work, we focus on Boolean 
masking as the XOR operation is more efficient to implement. 

To implement a masked program, for each operation in the cryptographic 
algorithm, a corresponding operation on shares is required. As we will see later, 
when the operation is affine (i.e. the operation f satisfies f(xy) = f(x) @f(y)Gc 
for some constant c), the corresponding operation is simply to apply the original 
operation on each share a; in the encoding (ag, a,,...,@q). However, for non- 
affine operations (e.g., multiplication and addition), it is a very difficult task and 
error-prone [24]. Ishai et al. [37] proposed the first masked implementation of 
multiplication, but limited to the domain GF(2) only. The number of the required 
random values and operations is not optimal and is known to be vulnerable in 
the presence of glitches because the electric signals propagate at different speeds 
in the combinatorial paths of hardware circuits. Thus, various follow-up papers 
proposed ways to implement higher-order masking for the domain GF(2”) and/or 
optimizing the computational complexity, e.g., [8,11,21,34,58], all of which are 
referred to as ISW scheme in this paper. In another research direction, new 
glitch-resistant Boolean masking schemes have been proposed, e.g., Hardware 
Private Circuits (HPC1 & HPC2) [20], Domain-oriented Masking (DOM) [35] 
and Consolidating Masking Schemes (CMS) [57]. In this work, we are interested 
in automatically proving the correctness of the masked programs. 


3 The Core Language 


In this section, we first present the core language MSL, given in Fig. 1, based on 
which the verification problem is formalized. 
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(expr) ::= (var) | (num) | (expr)®(expr) | (expr)®@(expr) | ((expr)) 
(stmts) ::= (var)<-(expr) | (var)<-rand | (stmts) (stmts) 
(var) <(id) proc ((vars)) | (var)<(id) affine ((var)) 
(proc) ::= proc(id) input(ids) output(id) (stmts) origin Shares (num) (stmts)masked 
(affine) ::= affine(id) [input(id) output (id) (stmts)| 


Fig. 1. Syntax of MSL in Backus-Naur form 


A program P in MSL is given by a sequence of procedure definitions and 
affine transformation definitions/declarations. A procedure definition starts with 
the keyword proc, followed by a procedure name, a list of input parameters, an 
output and its body. The procedure body has two blocks of statements, separated 
by a special statement shares d+1, where d is the masking order. The first block 
(stmts) origin, called the original block, implements its original functionality on 
the input parameters without masking. The second block (stmts) maskea, called 
the masked block, is a masked implementation of the original block over the 
input encodings x of the input parameters x. The input parameters and output 
x, declared using the keywords input and output respectively, are scalar variables 
in the original block, but are treated as the corresponding encodings (i.e., tuples) 
x in the masked block. For example, input x declares the scalar variable x as 
the input of the original block, while it implicitly declares an encoding x = 
(£0, £1,..., £q) as the input of the masked block with shares d+ 1. 

We distinguish affine transformation definitions and declarations. The former 
starts with the keyword affine, followed by a name f, an input, an output and its 
body. It is expected that the affine property Vz,y € F.f(a@y) = f(a) @ f(y) ®c 
holds for some affine constant c € F. (Note that the constant c is not explicitly 
provided in the program, but can be derived, cf. Sect. 6.2.) The transformation 
f is linear if its affine constant c is 0. In contrast, an affine transformation dec- 
laration f simply declares a transformation. As a result, it can only be used 
to declare a linear one (i.e., c must be 0), which is treated as an uninterpreted 
function. Note that non-linear affine transformation declarations can be achieved 
by declaring linear affine transformations and affine transformation definitions. 
Affine transformation here serves as an abstraction to capture complicated oper- 
ations (e.g., shift, rotation and bitwise Boolean operations) and can accelerate 
verification by expressing operations as uninterpreted functions. In practice, a 
majority of cryptographic algorithms (in symmetric cryptography) can be rep- 
resented by a composition of S-box, XOR and linear transformation only. 

Masking an affine transformation can simply mask an input encoding in a 
share-wise way, namely, the masked version of the affine transformation f(a) is 


| f(ao) ® flai) @... ® f(aa), if dis even; 
daet aa) = wee fay) ee Hay) Dc, ifdis odd. 


260 M. Liu et al. 


This is default, so affine transformation definition only contains the original 
block but no masked block. 

A statement is either an assignment or a function call. MSL features two 
types of assignments which are either of the form x « e defined as usual or of 
the form r <-rand which assigns a uniformly sampled value from the domain 
F to the variable r. As a result, r should be read as a random variable. We 
assume that each random variable is defined only once. We note that the actual 
parameters and output are scalar if the procedure is invoked in an original block 
while they are the corresponding encodings if it is invoked in a masked block. 

MSL is the core language of our tool. In practice, to be more user-friendly, 
our tool also accepts C programs with conditional branches and loops, both 
of which should be statically determinized (e.g., loops are bound and can be 
unrolled; the branching of conditionals can also be fixed after loop unrolling). 
Furthermore, we assume there is no recursion and dynamic memory allocation. 
These restrictions are sufficient for most symmetric cryptography and bitsliced 
implementations of public-key cryptography, which mostly have simple control 
graphs and memory aliases. 


Problem Formalization. Fix a program P with all the procedures using order- 
d masking. We denote by P, (resp. Pm) the program P where all the masked 
(resp. original) blocks are omitted. For each procedure f, the procedures f, and 
fm are defined accordingly. 


Definition 1. Given a procedure f of P with m input parameters, fm and fo 
are functional equivalent, denoted by fm = fo, if the following statement holds: 


Vat,- a”, r1, tn € F, Yat,- a” € FA, 
— J my _ 1 de my | 
Peg = Dyetoa 2) J > (fo(a’ pO Ge ) cia fala, a )i) 
where r1,--- ,Tn are all the random variables used in fm. 


Note that although the procedure fm is randomized (i.e., the output encoding 
fm(at,-++ a?) is technically a random variable), for functional equivalence we 
consider a stronger notion, viz., to require that fm and fə are equivalent under 
any values in the support of the random variables r1,:-- , rp. Thus, r1,--: , rp 
are universally quantified in Definition 1. 

The verification problem is to check if fm = fo for a given procedure f 
where \jepim) @ = Bjejo,a) a; and folat, a”) = Dicjo,q Imla, ,a™): 
are adel as pre- and post-conditions, respectively. Thus, we assume the 
unmasked procedures themselves are correct (which can be verified by, e.g., 
CryptoLine). Our focus is on whether the masked counterparts are functional 
equivalent to them. 


4 Overview of the Approach 


In this section, we first present a motivating example given in Fig. 2, which com- 
putes the multiplicative inverse in GF(2°) for the AES S-Box [58] using first-order 
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21 proc sec_exp254 input x output y 


1 affine exp2 input x output y 22 z < exp2(x) 

2 y+x«@®x 23 y+z®@x 

3 affine exp4 input x output y 24 w+ exp4(y) 

4 y © exp2(exp2(x)) 2 yy @w 

5 affine exp16 input x output y 26 y + expl6(y) 

6 y © exp4(exp4(x)) 27 yy @w 

7 2 yey@z 

8 proc sec_mult input a b output c 29 shares 2 

9 c+ aQ@®b 30 zo + exp2(xo) 

10 shares 2 31 Zz, + exp2(x1) 

11 ro + rand 32 Z 4— refresh_masks(Z) 
12 rı + ro ® (ao &® bı) @® (ai Q bo) 33 y + sec_mult(z, X) 
13 co + (ao ® bo) © ro 34 wo + exp4(yo) 

14 cı + (a1 Q bı) Ori 35 wi + exp4(y1) 

15 36 W + refresh_masks(w) 
16 proc refresh_masks input x output y 37 y + sec_mult(y, w) 
17 yx 38 yo + exp16(yo) 

18 shares 2 39 yı + exp16(yı) 

19 ro + rand 40 y + sec_mult(y, Ww) 
20 yo + xo ® ro yı + xı @ ro 41 y + sec_mult(y, Z) 


Fig. 2. Motivating example, where x denotes (x0, £1). 


Boolean masking. It consists of three affine transformation definitions and two 
procedure definitions. For a given input x, exp2(x) outputs x7, exp4(x) outputs 
xt and exp16(x) outputs «!°. Obviously, these three affine transformations are 
indeed linear. 

Procedure sec_mult,(a,b) outputs a@b. Its masked version sec_multm(a, b) 
computes the encoding c = (co,c1) over the encodings a = (ao,a1) and b = 
(bo, b1). Clearly, it is desired that co ® cı = a ®b if ag ® ay = a and bo ® 
bı = b. Procedure refresh_masks,(x) is the identity function while its masked 
version refresh_masks(x) re-masks the encoding x using a random variable ro. 
Thus, it is desired that yo 8 yı = w if £ = £o © xı. Procedure sec_exp254, (2) 
computes the multiplicative inverse 2?°* of æ in GF(28). Its masked version 
sec_exp254,,(x) computes the encoding y = (yo, y1) where refresh_masks,, is 
invoked to avoid power side-channel leakage. Thus, it is desired that yo ® yı = 
x4 if x9 Oa, = x. In summary, it is required to prove sec_multm © sec_mult., 
refresh_masks,, ~ refresh_masks, and sec_exp254,, © sec_exp254,. 


4.1 Our Approach 


An overview of FISCHER is shown in Fig.3. The input program is expected 
to follow the syntax of MSL but in C language. Moreover, the pre-conditions 
and post-conditions of the verification problem are expressed by assume and 
assert statements in the masked procedure, respectively. Recall that the input 
program can contain conditional branches and loops when are statically deter- 
minized. Furthermore, affine transformations can use other common operations 
(e.g., shift, rotation and bitwise Boolean operations) besides the addition © and 
multiplication ® on the underlying field F. FISCHER leverages the LLVM frame- 
work to obtain the LLVM intermediate representation (IR) and call graph, where 
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FISCHER 


(Functional |; į Saket : 
i Equivalence |; i| _ Execution j: 


Checking HY aaa E 

Affine ; ; Rewriting : 
i| Constant |+—»i{ SMT-based |} 
i| Computing J; i| Solving ! 
L L 


Fig. 3. Overview of FISCHER. 


all the procedure calls are inlined. It then invokes Affine Constant Computing to 
iteratively compute the affine constants for affine transformations according to 
the call graph, and Functional Equivalence Checking to check functional equiva- 
lence, both of which rely on the underpinning engines, viz., Symbolic Execution 
(refer to symbolic computation without path constraint solving in this work), 
Term Rewriting and SMT-based Solving. 

We apply intra-procedural symbolic execution to compute the symbolic out- 
puts of the procedures and transformations, i.e., expressions in terms of inputs, 
random variables and affine transformations. The symbolic outputs are treated 
as terms based on which both the problems of functional equivalence checking 
and affine constant computing are solved by rewriting to their normal forms (i.e., 
sums of monomials w.r.t. a total order). The analysis result is often conclusive 
from normal forms. In case it is inconclusive, we iteratively inline affine trans- 
formations when their definitions are available until either the analysis result 
is conclusive or no more affine transformations can be inlined. If the analysis 
result is still inconclusive, to reduce false positives, we apply random testing 
and accurate (but computationally expansive) SMT solving to the normal forms 
instead of the original terms. We remark that the term rewriting system solely 
can prove almost all the benchmarks in our experiments. 

Consider the motivating example. To find the constant c € F of exp2 such 
that the property Vz,y € F.exp2(4% ® y) = exp2(x) ® exp2(y) ® c holds, by 
applying symbolic execution, exp2(x) is expressed as the term x @ x. Thus, the 
property is reformulated as (x ® y) ® (x @ y) = (x 8 x) ® (y 8 y) Ge, from 
which we can deduce that the desired affine constant c is equivalent to the term 


((x@y) @ (x @y)) B(x @ x) (y Qy). Our TRS will reduce the term as follows: 

((1Dy)8 (18y) @ («€ @ x) PB (y @y) Distributive Law 
=(rx@(x@y)) PS (y@(x#Sy)) (18r) (y@y) Distributive Law 
=(t#@xr)O(x#@y) PS(y@z) S(y@y) S(L#@xz)S(y@y) Commutative Law 
=(L#@xr)P(e#@y) P(x@y) P(y@y) P(L€Bx)S(y@y) Commutative Law 
=(x#@xr)O(t# Sx) O(x#@y) G(e#@y) PG (y@y) PG (yBy)=0 Zero Law of XOR 


For the transformation exp4(z), by applying symbolic execution, it can be 
expressed as the term exp2(exp2(’)). To find the constant c € F to satisfy Vx, y € 
F.exp4(a@®y) = exp4(x) Bexp4(y) Bc, we compute the term exp2(exp2(x @y)) @ 
exp2(exp2(x)) @ exp2(exp2(y)). By applying our TRS, we have: 


Automated Verification of Correctness for Masked Arithmetic Programs 263 


exp2(exp2(x © y)) ® exp2(exp2(x)) ® exp2(exp2(y)) 
= exp2(exp2(z) © exp2(y)) © exp2(exp2(cr)) © exp2(exp2(y)) 
= exp2(exp2(x)) ® exp2(exp2(y)) ® exp2(exp2(x)) ® exp2(exp2(y)) 
( (exp2(y)) = 0 


= exp2(exp2(x)) ® exp2(exp2(x)) ® exp2(exp2(y)) ® exp2 


Clearly, the affine constant of exp4 is 0. Similarly, we can deduce that the affine 
constant of the transformation exp16 is 0 as well. 

To prove sec_mult, © sec_mult,, by applying symbolic execution, we have 
that sec_mult,(a,b) = a @ b and sec_multm(a,b) = c = (co,c1), where co = 
(ap ®bp) P79 and cy = (a1 @b1) O(79 B (ap @b1) @ (a1 @bg)). Then, by Definition 1, 
it suffices to check 


Va, b, ao, 41, bo, b1, ro € F. (a = a0 a, Ab = bo @ bı) = 
(a ® b = ((ao 8 bo) © ro) ® ((a1 @ b1) @ (ro @® (ao ® b1) @ (a Q bo)))). 


Thus, we check the term ((ao ® a1) ® (bo @ b1)) © ((ao 8 bo) S ro) S ((a1 8 b1) S 
(ro @ (ao b1) S (a1 Q bo))) which is equivalent to 0 iff sec_mult, © sec_multm. 
Our TRS is able to reduce the term to 0. Similarly, we represent the outputs 
of sec_exp254, and sec_exp254,, as terms via symbolic execution, from which 


the statement sec_exp254, = sec_exp254,, is also encoded as a term, which 
can be reduced to 0 via our TRS without inlining any transformations. 


5 Term Rewriting System 


In this section, we first introduce some basic notations and then present our 
term rewriting system. 


Definition 2. Given a program P over F, a signature Xp of P is a set of 
symbols FU {@, ®, fi,.-., fi}, where s € F with arity 0 are all the constants in 
F, © and ® with arity 2 are addition and multiplication operators on F, and 
fıs: , fe with arity 1 are affine transformations defined/declared in P. 


For example, the signature of the motivating example is F U 
{®, Q, exp2, exp4, exp16}. When it is clear from the context, the subscript P 
is dropped from Xp. 


Definition 3. Let V be a set of variables (assuming YOAV = 0), the set T|X, V] 
of -terms over V is inductively defined as follows: 


-~FCT[L,V] and V CT|Z,V] (i.e., every variable/constant is a X-term); 

- TOT ET|X, V] andr @r' E€ T|X, V] ift, r' € T[X,V] (i.e., application of 
addition and multiplication operators to X-terms yield X-terms); 

- filt) € T|X, V] ifr € T[X,V] and j € [1,t] (i.e., application of affine 


transformations to X-terms yield X-terms). 


We denote by Tẹ(X,V) the set of X-terms that do not use the operator ®. 
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A X-term a € T|X, V] is called a factor if r E€ FUV or T = f(T’) for some 
i € [1,t] such that 7 € T\g(2,V). A monomial is a product a ®--- ® ag of 
none-zero factors for k > 1. We denote by M[’,V] the set of monomials. For 
instance, consider variables x,y € V and affine transformations fı, f2 € X. All 
filfa(x)) @ Aly), A(28 fo(4@x)), file @y) and fi(fo(x)) D fi(x) are X-terms, 
both fi(fo(x))@fi(y) and fı(28 f2(48x)) are monomials, while neither f;(~@y) 
nor fı(f2(x))® fı(x) is a monomial. For the sake of presentation, X-terms will be 
written as terms, and the operator ® may be omitted, e.g., 7,72 denotes T1 © T2, 
and 7? denotes T ® T. 


Definition 4. A polynomial is a sum Dien 4) Mi of monomials mı... M; E 
M[X, V]. We use P|X,V] to denote the set of polynomials. 


To simplify and normalize polynomials, we impose a total order on monomials 
and their factors. 


Definition 5. Fix an arbitrary total order >, on V W X. 
For two factors a and a’, the factor order >; is defined such that a >, a’ if 
one of the following conditions holds: 


-aa €FUV anda >,a; 
ZAST f(r) and al = fitz") such that f >s f or (f = f and T > T’); 
- a = f(T) such that f >, a’ or a! = f(T) such that a >s f. 


Given a monomial m = a1 +: ax, we write sorts, (a1,--- , ax) for the monomial 
which includes a1,--- ‚œp as factors, but sorts them in descending order. 

Given two monomials m = ai’: ap and M = al: a, the monomial 
order >, is defined as the lexicographical order between sort>,(a1,--* ,a,) and 
sort>,(a,--+ Qa). 


Intuitively, the factor order >; follows the given order >, on V W X, where 
the factor order between two factors with the same affine transformation f 
is determined by their parameters. We note that if sorts,(aj,--- ,aj,/) is a 
prefix of sorts,(a1,--- ,a@%), we have: a1-+-ax >p Qitay. Furthermore, if 
a1 Ak Sp ah---a,, and ajte ah >p a1: ag, then sorts,(aj,--- , ah.) = 
sorts ,(a1,:+- ,@%). We denote by a1 --- Qk >p a+: aly, if ay +++ an Sp A+++ Ay 
but sorts, (a4,-+- ,a,,) # sorts, (a1,-+- , ax). 


Proposition 1. The monomial order >, is a total order on monomaals. 


Definition 6. Given a program P, we define the corresponding term rewriting 
system (TRS) R as a tuple (X, V, >., A), where X is a signature of P, V is a 
set of variables of P (assuming SOV =), >, is a total order on VY X, and 
A is the set of term rewriting rules given below: 


Automated Verification of Correctness for Masked Arithmetic Programs 265 


ki (mi, ,mj,) = sorts, (Mi; , mg) A (Mi, , Mx) is i 
mi 9- Omp m, @--- Omi, TOTO 0r = 0 
i (ai ak) = sorts, (a1, ak) A (a1, , Ak) RA es 
Ay ++ Apr aye ah, TO= 0 TROT 

Ri R8 R9 R10 

OBTHT TLT Ire T (71 E T2)T + (T17) (T27) 
R11 R12 R13 

T(Ti © T2) > (771) Ð (TT2) FT @ 72) > fT) fT) Be fO) =c 
where mı, M4, Mp, m, E M[Z,V], a1, &2,@3 are factors, T, T1, T2 € T|X, V] 


are terms, f € X is an affine transformation with affine constant c. 


Intuitively, rules R1 and R2 specify the commutativity of 6 and &, respec- 
tively, by which monomials and factors are sorted according to the orders >, 
and >, respectively. Rule R3 specifies that © is essentially bitwise XOR. Rules 
R4 and R5 specify that 0 is the multiplicative zero. Rules R6 and R7 (resp. R8 
and R9) specify that 0 (resp. 1) is additive (resp. multiplicative) identity. Rules 
R10 and R11 express the distributivity of ® over 6. Rule R12 expresses the 
affine property of an affine transformation while rule R13 is an instance of rule 
R12 via rules R3 and R5. 

Given a TRS R = (X, V, >., A) for a given program P, a term 7 € T[X’, V] 
can be rewritten to a term 7’, denoted by 7 => 7’, if there is a rewriting rule 
Tı +> T> such that 7’ is a term obtained from 7 by replacing an occurrence of the 
sub-term 7, with the sub-term 72. A term is in a normal form if no rewriting 
rules can be applied. A TRS is terminating if all terms can be rewritten to a 
normal form after finitely many rewriting. We denote by + > 7’ with 7’ being 
the normal form of T. 

We show that any TRS R associated with a program P is terminating, and 
that any term will be rewritten to a normal form that is a polynomial, indepen- 
dent of the way of applying rules. 


Lemma 1. For every normal form T E T|X, V] of the TRS R, the term T must 
be a polynomial mı ®--- @ my such that (1) Vi € [1,k — 1], Mmi >p Mi+ı, and 
(2) for every monomial m; = a,--- ap, and Vi € [1,h — 1], ay >) 441. 


Proof. Consider a normal form 7 € T|X, V]. If T is not a polynomial, then there 
must exist some monomial m; in which the addition operator © is used. This 
means that either rule Rio or R11 is applicable to the term 7 which contradicts 
the fact that 7 is normal form. 

Suppose 7 is the polynomial mı 6--- P mz. 


— If there exists i: 1 < i < k such that mi >p Mi+ı does not hold, then either 
Mi = M41 OF M41 >p Mi. If Mmi = mi41, then rule R3 is applicable to the 
term T. If mi41 >p mi, then rule Rı is applicable to the term 7. Thus, for 
every 1 < i < k, mj >p M41. 

— If there exist a monomial m; = a ,---ap, and i: 1 < i < h such that a; >, 
Qi+1ı does not hold, then a;41 >; a;. This means that rule R2 is applicable to 
the term 7. Thus, for every monomial m; = a,---ap, and every i: 1 <i< h, 
Qi 21 Qi+1. 
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Lemma 2. The TRS R = (X, V, >., A) of a given program P is terminating. 


Proof. Consider a term r € T|X, V]. Let t = 71 > T2 > 73 > +++ > Ti =>- be 
a reduction of the term 7 by applying rewriting rules, i.e., 7 = T1. We prove that 
the reduction 7 is finite by showing that all the rewriting rules can be applied 
finitely. 

First, since rules R1 and R2 only sort the monomials and factors, respectively, 
while sorting always terminates using any classic sorting algorithm (e.g., quick 
sort algorithm), rules R1 and R2 can only be consecutively applied finitely for 
each term 7; due to the premises sorts,(m1,---,mz) Æ (Mi, Mk) and 
sorts, (a1,--- ,a,) #(a1,°-: , ax) in rules R1 and R2, respectively. 

Second, rules R10, R11 and R12 can only be applied finitely in the reduction 
m, aS these rules always push the addition operator © toward the root of the 
syntax tree of the term 7; when one of them is applied onto a term 7;, while the 
other rules either eliminate or reorder the addition operator ©. 


Algorithm 1: Term Normalization 


1 Function TermNorm(R, 7, A): 

Rewrite 7 by iteratively applying rules R3—R13 until no more update; 

T’ — sort(T) by iteratively applying rule Ro; 

T’ — sort(r’) by iteratively applying rule R1; 

Rewrite T’ by iteratively applying rules R3, Re, R7 until no more update; 
return 7’ 


art AUN 


Lastly, rules R3-9 and R13 can only be applied finitely in the reduction 7, 
as these rules reduce the size of the term by 1 when one of them is applied onto 
a term 7; while the rules R10-12 that increase the size of the term can only be 
applied finitely. 

Hence, the reduction 7 is finite indicating that the TRS R is terminating. 


By Lemmas 1 and 2, any term 7 € T[X,V] can be rewritten to a normal 
form that must be a polynomial. 


Theorem 1. Let R = (X, V,>., A) be the TRS of a program P. For any term 
T ET|X, V], a polynomial r’ € T[X’,V] can be computed such that T > T. 


Remark 1. Besides the termination of a TRS, confluence is another important 
property of a TRS, where a TRS is confluent if any given term 7 € T[X’,V] 
can be rewritten to two distinct terms 7, and T2, then the terms 7, and T2 can 
be reduced to a common term. While we conjecture that the TRS R associated 
with the given program is indeed confluent which may be shown by its local 
confluence [51], we do not strive to prove its confluence, as it is irrelevant to the 
problem considered in the current work. 
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6 Algorithmic Verification 


In this section, we first present an algorithm for computing normal forms, then 
show how to compute the affine constant for an affine transformation, and finally 
propose an algorithm for solving the verification problem. 


6.1 Term Normalization Algorithm 


We provide the function TermNorm (cf. Algorithm 1) which applies the rewriting 
rules in a particular order aiming for better efficiency. Fix a TRS R = (X, V, >., 
A), aterm 7 € T|X, V] and a mapping à that provides required affine constants 
A(f). TermNorm(R,7, A) returns a normal form 7’ of 7, i.e., T T. 


Algorithm 2: Computing Affine Constants 


1 Function AffConst(P,R,G): 

2 foreach affine transformation f in a topological order of call graph G do 
3 if f is only declared in P then 

4 Af) = 0; 

5 else 

6 x —input of f; 

7 E(x) — symbolicExecution(f); 

8 T — E(z)le = z @ y] © E(x) © E(a)[a > y; 

9 while True do 
10 7 — TermNorm(R,T, A); 
11 if r is some constant c then 
12 A(f) — c; break; 
13 else if g is defined in P but has not been inlined in T then 
14 Inline g in 7; continue; 
15 else if 7 does not contain any uninterpreted function then 
16 U1, U1, V2, U2 random values from F s.t. vı Æ v2 V ui Æ u2; 
17 if |z => v, y > u1] Æ T[£ > v2, y |> u2] then 

18 | Emit(f is not affine) and Abort; 

19 if SMTSolver(Vz.Vy.7 = c)=SAT then 
20 A(f) extract c from the model; break; 
21 else Emit(f may not be affine) and Abort; 
22 return à; 


TermNorm first applies rules R3-R13 to rewrite the term 7 (line 2), resulting 
in a polynomial which does not have 0 as a factor or monomial (due to rules 
R4-R7), or 1 as a factor in a monomial unless the monomial itself is 1 (due to 
rules Rg and Rg). Next, it recursively sorts all the factors and monomial involved 
in the polynomial from the innermost sub-terms (lines 3 and 4). Sorting factors 
and monomials will place the same monomials at adjacent positions. Finally, 
rules R3 and R6-R7 are further applied to simplify the polynomial (line 5), 


268 M. Liu et al. 


where consecutive syntactically equivalent monomials will be rewritten to 0 by 
rule R3, which may further enable rules R6—R7. Obviously, the final term 7’ is 
a normal form of the input 7, although its size may be exponential in that of 7. 


Lemma 3. TermNorm(R,7, A) returns a normal form T’ of T. 


6.2 Computing Affine Constants 


The function AffConst in Algorithm 2 computes the associated affine constant 
for an affine transformation f. It first sorts all affine transformations in a topo- 
logical order based on the call graph G (lines 2-21). If f is only declared in 
P, as mentioned previously, we assumed it is linear, thus 0 is assigned to X(f) 
(line 4). Otherwise, it extracts the input x of f and computes its output E(x) 
via symbolic execution (line 7), where €(x) is treated as f(x). We remark that 
during symbolic execution, we adopt a lazy strategy for inlining invoked affine 
transformations in f to reduce the size of E(x). Thus, €(x) may contain affine 
transformations. 

Recall that c is the affine constant of f iff Yx, y € F.f(x@y) = f(x) @ fly) Gc 
holds. Thus, we create the term 7 = €(x)[x@ > r@y] @E(x) BE(x) [x — y] (line 7), 
where eja +> 6] denotes the substitution of a with b in e. Obviously, the term T 
is equivalent to some constant c iff c is the affine constant of f. 

The while-loop (lines 9-21) evaluates r. First, it rewrites T to a normal form 
(line 10) by invoking TermNorm in Alg.1. If the normal form is some constant c, 
then c is the affine constant of f. Otherwise, AffConst repeatedly inlines each 
affine transformation g that is defined in P but has not been inlined in 7 (lines 13 
and 14) and rewrites the term 7 to a normal form until either the normal form 
is some constant c or no affine transformation can be inlined. If the normal form 
is still not a constant, T is evaluated using random input values. Clearly, if 7 is 
evaluated to two distinct values (line 18), f is not affine. Otherwise, we check the 
satisfiability of the constraint Vz, y.7 = c via an SMT solver in bitvector theory 
(line 19), where declared but undefined affine transformations are treated as 
uninterpreted functions provided with their affine properties. If Vz,y.7 = c is 
satisfiable, we extract the affine constant c from its model (line 20). Otherwise, 
we emit an error and then abort (line 21), indicating that the affine constant of 
f cannot be computed. Since the satisfiability problem module bitvector theory 
is decidable, we can conclude that f is not affine if Va.Vy.7 = c is unsatisfiable 
and no uninterpreted function is involved in T. 


Lemma 4. Assume an affine transformation f in P. If AffConst(P,R,G) in 
Algorithm 2 returns a mapping A, then A(f) is the affine constant of f. 


6.3 Verification Algorithm 


The verification problem is solved by the function Verifier(P) in Algorithm 3, 
which checks if fm = fo, for each procedure f defined in P. It first preprocesses 
the given program P by inlining all the procedures, unrolling all the loops and 
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eliminating all the branches (line 2). Then, it computes the corresponding TRS 
R, call graph G and affine constants as the mapping åA, respectively (line 3). Next, 
it iteratively checks if fm fo, for each procedure f defined in P (lines 4-23). 

For each procedure f, it first extracts the inputs a!,--- ,a™ of fo that are 
scalar variables (line 5) and input encodings a!,--- ,a™ of fm that are vectors of 
variables (line 6). Then, it computes the output €(a',--- , a’) of fy via symbolic 
execution, which yields an expression in terms of a!,--- ,a™ and affine trans- 
formations (line 7). Similarly, it computes the output é’(a!,--- ,a™) of fm via 
symbolic execution, i.e., a tuple of expressions in terms of the entries of the input 
encodings a!,--- ,a™, random variables and affine transformations (line 8). 

Recall that fm © fo iff for all a!,---,a™,r1,---,r, € F and for all 
al,---,a” c F4+1, the following constraint holds (cf. Definition 1): 


Cl Gee pa Peia ai) ms (Fola, ian ,a™) E Deng fala', C sa 


where r1, ++- , rp are all the random variables used in fm. Thus, it creates the term 
T = lal, ,a™)[al = Qat,- ,a™ = Qa”) 6 PéE(al,--- ,a™) (line 9), 
where a’ ++ Qa’ is the substitution of a’ with the term @ a’ in the expression 
E(a',--- ,a™). Obviously, T is equivalent to 0 iff fm S fo- 


Algorithm 3: Verification Algorithm 


1 Function Verifier(P): 
2 Inline all the procedures, unroll loops and eliminate branches in P; 
3 R + buildTRS(P); G — buildCallGraph(P); A — AffConst(P, R, G); 
4 foreach procedure f defined in P do 
5 Let a',--- ,a™ be the inputs of fo; 
6 Let a,- -- ,a™ be the input encodings of fm; 
7 E(a’,+++ ,a™) — symbolicExecution( fo); 
8 ¿'(a!,.-- ,a™) — symbolicExecution(fm); 
9 TE Elat, ,a™)[a* = @a',--: ,a™ a pa”] ® Me (a’,--- a 
10 while True do 
11 T — TermNorm(R,7, A) 
12 if r is some constant c then 
13 if c=0 then Emit(f is correct); break; 
14 else Emit(f is incorrect); break; 
15 else if g is defined in P but has not been inlined in 7 then 
16 Inline g in 7; continue; 
17 else if 7 does not contain any uninterpreted function then 
18 vl,- ,v™ random values from F¢+!; 
19 if rja!  v',--- ,a™ m= v™] £0 then 
20 | Emit(f is incorrect); break; 
21 if SMTSolver(7 # 0)=UNSAT then 
22 Emit(f is correct); break; 
23 else Emit(f may be incorrect); break; 
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To check if 7 is equivalent to 0, similar to computing affine constants in 
Algorithm 2, the algorithm repeatedly rewrites the term 7 to a normal form by 
invoking TermNorm in Algorithm 1 until either the conclusion is drawn or no 
affine transformation can be inlined (lines 10-23). We declare that f is correct if 
the normal form is 0 (line 13) and incorrect if it is a non-zero constant (line 14). 
If the normal form is not a constant, we repeatedly inline affine transformation 
g defined in P which has not been inlined in 7 and re-check the term 7. 

If there is no definite answer after inlining all the affine transformations, T 
is evaluated using random input values. f is incorrect if T is non-zero (line 20). 
Otherwise, we check the satisfiability of the constraint T 4 0 via an SMT solver 
in bitvector theory (line 21). If 7 4 0 is unsatisfiable, then f is correct. Otherwise 
we can conclude that f is incorrect if no uninterpreted function is involved in 7, 
but in other cases it is not conclusive. 


Theorem 2. Assume a procedure f in P. If Verifier(P) emits “f is correct”, 
then fm = fo; if Verifier(P) emits “f is incorrect” or “f may be incorrect” 
with no uninterpreted function involved in its final term T, then fm F fo. 


6.4 Implementation Remarks 


To implement the algorithms, we use the total order >, on V W X where all 
the constants are smaller than the variables, which are in turn smaller than the 
affine transformations. The order of constants is the standard one on integers, 
and the order of variables (affine transformations) uses lexicographic order. 

In terms of data structure, each term is primarily stored by a directed acyclic 
graph, allowing us to represent and rewrite common sub-terms in an optimised 
way. Once a (sub-)term becomes a polynomial during term rewriting, it is stored 
as a sorted nested list w.r.t. the monomial order >,, where each monomial is 
also stored as a sorted list w.r.t. the factor order >;. Moreover, the factor of the 
form a* in a monomial is stored by a pair (a, k). 

We also adopted two strategies: (i) By Fermat’s little theorem [63], x?” =t = 1 
for any x € GF(2”). Hence each k in (a, k) can be simplified to k mod (2” — 1). 
(ii) By rule R12, a term f(71 ®--- @ Tk) can be directly rewritten to f(71) ® 
+++ O (Tk) if k is odd, and f(™)®---@ f(Tk) @c if k is even, where c is the affine 
constant associated with the affine transformation f. 


7 Evaluation 


We implement our approach as a tool FISCHER for verifying masked programs 
in LLVM IR, based on the LLVM framework. We first evaluate FISCHER for 
computing affine constants (i.e., Algorithm 2), correctness verification, and scal- 
ability w.r.t. the masking order (i.e., Algorithm 3) on benchmarks using the 
ISW scheme. To show the generality of our approach, FISCHER is then used to 
verify benchmarks using glitch-resistant Boolean masking schemes and lattice- 
based public-key cryptography. All experiments are conducted on a machine 
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with Linux kernel 5.10, Intel i7 10700 CPU (4.8 GHz, 8 cores, 16 threads) and 
40 GB memory. Milliseconds (ms) and seconds (s) are used as the time units in 
our experiments. 


7.1 Evaluation for Computing Affine Constants 


To evaluate Algorithm 2, we compare with a pure SMT-based approach which 
directly checks 3c.Vx,y € F.f(a $ y) = f(x) ® f(y) @c using Z3 [47], 
CVC5 [5] and Boolector [18], by implementing © and ® in bit-vector the- 
ory, where @ is achieved via the Russian peasant method [16]. Technically, 
SMT solvers only deal with satisfiability, but they usually can eliminate the 
universal quantifiers in this case, as x,y are over a finite field. In partic- 
ular, in our experiment, Z3 is configured with default (i.e. (check-sat)), 
simplify (i.e. (check-sat-using (then simplify smt))) and bit-blast (i.e. 
(check-sat-using (then bit-blast smt))), denoted by Z3-d, Z3-s and Z3- 
b, respectively. We focus on the following functions: expi(x) = xt for i € 
{2,4,8,16}; rotli(x) for i € {1,2,3,4} that left rotates x by i bits; af(x) = 
rotl1(x) © rot12(x) © rot13(x) © rotl4(x) @ 99 used in AES S-Box; Li(a) = 
7x? @ 1424 © 72°, L3(x) = Tx © 122? © 1224 © 9x8, L5(x) = 10x ® 9z? and 
L7(x) = 42 6 132? $ 132+ @ 1428 used in PRESENT S-Box over GF(16) = 
GF(2)[X]/(X4 + X +1) [14,19]; £1(x) = 2°, £2(2) = 2? Or Ol, £3(z) = z Q zë 
and f4(x) = af(exp2(z)) over GF(2°). 


Table 1. Results of computing affine constants, where f means Algorithm 2 needs 
SMT solving, { means affineness is disproved via testing, X means nonaffineness, and 
Algorithm 2+B means Algorithm 2+Boolector. 


Tool exp2 |exp4 |exp8 |expié |rotli |rot12 |rot13 | rot14 | af Li L3 L5 L7 f1 | £2 f3 |f4 
Algorithm 2+Z3-d|3ms |3ms |3ms |3ms /18ms!|18ms'|/18ms'/18ms'|21ms'/3ms |3ms |3ms |3ms |3ms?|3ms | 3ms* |21mst 
Algorithm 2+Z3-b|3ms |3ms |3ms |3ms |15ms'/16ms'|15ms'|15ms'|20ms'|3ms |3ms |3ms |3ms /|3ms*|3ms_ |3ms* |20mst 
Algorithm 2+B |3ms |3ms |3ms |3ms |8ms? |8ms! |8ms! |8ms' |13mst|3ms |3ms |3ms |3ms |3ms?|3ms |3ms*|14mst 
Z3-d 181 ms | 333ms |316 ms | 521 ms | 14ms |14ms | 14ms |14ms |16ms |113ms|213ms|73ms_ |194ms | 33 ms | 249 ms | 38 ms | 7.5s 

Z3-s 180 ms | 373 ms | 452 ms | 528ms| 12ms |12ms |12ms |12ms |15ms | 158ms|202ms | 194 ms | 213 ms | 28 ms | 252 ms | 35 ms | 7.6s 

Z3-b 15ms |16ms |18ms |20ms |12ms /12ms |12ms |12ms |79ms |45ms |42ms |21ms |82ms |17ms|22ms | 24ms/|60ms 
Boolector 15ms |18ms |12ms |17ms |5ms |5ms |6ms |5ms |71ms |25ms |34ms |27ms |78ms |14ms|15ms | 17ms|67ms 
CVc5 8.4s | 203s |44.4s |186s |5ms |5ms |5ms |5ms | 113ms| 158.4s/263.4s|43.7s |214.9s|92ms]| 10.38 | 2.38 | 10.4s 
Result 0 0 0 0 0 0 0 0 99 0 0 0 0 x 1 x 99 


The results are reported in Table 1, where the 2nd—8th rows show the exe- 
cution time and the last row shows the affine constants if they exist otherwise 
X. We observe that Algorithm 2 significantly outperforms the SMT-based app- 
roach on most cases for all the SMT solvers, except for rotli and af (It is not 
surprising, as they use operations rather than © and ®, thus SMT solving is 
required). The term rewriting system is often able to compute affine constants 
solely (e.g., exp? and Li), and SMT solving is required only for computing the 
affine constants of rotli. By comparing the results of Algorithm 2+2Z3-b vs. 
Z3-b and Algorithm 2+B vs. Boolector on af, we observe that term rewriting is 
essential as checking normal form—instead of the original constraint—reduces 
the cost of SMT solving. 
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7.2 Evaluation for Correctness Verification 


To evaluate Algorithm 3, we compare it with a pure SMT-based approach 
with SMT solvers Z3, CVC5 and Boolector. We also consider several promising 
general-purpose software verifiers SMACK (with Boogie and Corral engines), 
SeaHorn, CPAChecker and Symbiotic, and one cryptography-specific verifier 
CryptoLine (with SMT and CAS solvers), where the verification problem is 
expressed using assume and assert statements. Those verifiers are configured in 
two ways: (1) recommended ones in the manual/paper or used in the competi- 
tion, and (2) by trials of different configurations and selecting the optimal one. 
Specifically: 


— CryptoLine (commit 7e237a9). Both solvers SMT and CAS are used; 

— SMACK v2.8.0. integer-encoding: bit-vector, verifier: corral/boogie (both 
used), solver: Z3/CVC4 (Z3 used), static-unroll: on, unroll: 99; 

— SEAHORN v0.1.0 RC3 (commit e712712). pipeline: bpf, arch: m64, inline: 
on, track: mem, bmc: none/mono/path (mono used), crab: on/off (off used); 

— CPAChecker v2.1.1. default.properties with cbmc: on/off (on used); 

— Symbiotic v8.0.0. officially-provided SV-COMP configuration with exit-on- 
error: on. 


The benchmark comprises five different masked programs sec_mul1t for finite- 
field multiplication over GF(2°) by varying masking order d = 0,1,2,3, where 
the d = 0 means the program is unmasked. We note that sec_mult in [8] is only 
available for masking order d > 2. 


Table 2. Results on various sec_mult, where T.O. means time out (20min), N/A 
means that UNKNOWN result, and § means that verification result is incorrect. 


Order | Ref. | Algorithm 3 | Z3 Boolector |CVC5 | CryptoLine SMACK SeaHorn | CPAChecker | Symbiotic 

d default | simplify | bit-blast SMT | CAS | Boogie | Corral 

0 58] | 17 ms 29 ms 27ms |42ms 25 ms 29ms |39ms |N/A |29s 66s 132ms |T.O 870s 
11] |20 ms 3lmsms/3lms_ |45ms 28 ms 33ms |35ms |N/A |46s 144s |128ms |T.O 899s 
34] |21 ms 33 ms 3lms_ |46ms 29 ms 33ms |32ms |N/A |23s 43s 127ms |T.O 872s 
21] | 18 ms 30 ms 28ms | 25ms 26 ms 3lms |32ms |N/A} 17s 56s 130ms | T.O 876s 

h 58] | 18 ms 298ms |299ms |391s 3.85 T.O | 469ms/N/A|T.O T.O 13s T.O T.Q 
11] |20 ms 299ms |299ms |1049s 1.91049 |T.O |582ms|N/A |T.O T.O 13s T.O T.O 
34] | 24ms 295ms | 295ms | 1199s 18s T.O | 95lms/N/A|T.O T.O l4s T.O T.O 
21] |20 ms 1180s 921s T.O 7.78 T.O (21s N/A | T.O T.O TO T.O T.O. 

2 58] | 20ms 41s 42s TO TO T.O T.O N/A | T.O T.O T.O T.O T.O 
11] |22 ms 4.25s 4.4s T.O T.O TO ITO N/A | T.O T.O T.O T.O T.O 
8] | 30ms 4.25s 41s TO T.O TO TO N/A | T.O 26s? T.O T.O T.O 
34] |29 ms 4.25s 4.25s T.O T.O TO ITO N/A | T.O T.O TO T.O T.O 
21] |22 ms T.O TO T.O TO T.O |T.O N/A | T.O T.O T.O T.O T.Q. 

3 58] |21 ms T.O T.O T.O T.O TO ITO N/A | T.O T.O T.O T.O T.O 
11] |26 ms TO TO T.O T.O TO TO N/A | T.O T.O T.O T.Q T.O 
8] |27ms T TO: TO T.O T.O |T.O N/A | T.O 1059s? | T.O T.O T.O 
34] |29 ms T.O TO T.O T.O T.O |T.O N/A | T.O T.O T.O T.O T.O 
21] |24ms TO T.O TO T.O T.O |T.O N/A | T.O T.O T.O T.O T.O 


The results are shown in Table 2. We can observe that FISCHER is signifi- 
cantly more efficient than the others, and is able to prove all the cases using 
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our term rewriting system solely (i.e., without random testing or SMT solving). 
With the increase of masking order d, almost all the other tools failed. Both 
CryptoLine (with the CAS solver) and CPAChecker fail to verify any of the 
cases due to the non-linear operations involved in sec_mult. SMACK with Cor- 
ral engine produces two false positives (marked by { in Table 2). These results 
suggest that dedicated verification approaches are required for proving the cor- 
rectness of masked programs. 


7.3 Scalability of FISCHER 


To evaluate the scalability of FISCHER, we verify different versions of sec_mult 
and masked procedures sec_aes_sbox (resp. sec_present_sbox) of S-Boxes 
used in AES [58] (resp. PRESENT [19]) with varying masking order d. Since 
it is known that refresh_masks in [58] is vulnerable when d > 4 [24], a fixed 
version RefreshM |7] is used in all the S-Boxes (except that when sec_mult is 
taken from [8] its own version is used). We note that sec_present_sbox uses 
the affine transformations L1, L3, L5, L7, exp2 and exp4, while sec_aes_sbox 
uses the affine transformations af, exp2, exp4 and exp16. 

The results are reported in Table 3. All those benchmarks are proved using 
our term rewriting system solely except for the three incorrect ones marked 
by h. FISCHER scales up to masking order of 100 or even 200 for sec_mult, 
which is remarkable. FISCHER also scales up to masking order of 30 or even 
40 for sec_present_sbox. However, it is less scalable on sec_aes_sbox, as it 
computes the multiplicative inverse 7754 on shares, and the size of the term 
encoding the equivalence problem explodes with the increase of the masking 
order. Furthermore, to better demonstrate the effectiveness of our term writing 
system in dealing with complicated procedures, we first use Algorithm 2 to derive 
affine constants on sec_aes_sbox with ISW [58] and then directly apply SMT 
solvers to solve the correctness constraints obtained at Line 9 of Algorithm 3. 
It takes about 1s to obtain the result on the first-order masking, while fails to 
obtain the result within 20 min on the second-order masking. 


Table 3. Results on sec_mult and S-Boxes, where T.O. means time out (20 min), and 
h means that the program is incorrect. 


Ref. d 


sec_mult sec_present_sbox sec_aes_sbox 


5 / 10 20 50 |100 | 200 j1 2 5 |10 20 30 |40 |1 |2 4 J5 


ISW [58] 23ms |33ms |84ms |1.0s |15s |545s |44ms|51ms|93ms |535ms|l4s |118s|T.O.|87ms | 234ms| 25s | 160s 
ISW [11] 26ms |44ms |100ms|712ms |7.3s | 212s |54ms |63ms |110ms |673ms 17s | 163s | T.O. | 108 ms | 265 ms | 23s | 142s 
ISW [8] 36ms?|49ms | 109 ms | 601 ms | 3.28 (18s |- 86 ms | 142ms* | 237 ms |841 ms | 2.48 |5.3s |- | 559 ms | 9.7s | 142s" 
ISW [34] 34ms |50ms |98ms |518ms/3.1s 19s |67ms/91ms/137ms |700ms|20s |173s | T.O. | 140 ms | 571 ms | 63s | T.O. 
ISW [21] 30ms |109ms|224ms|5.0s |152s | T.O. |51 ms|61ms|113ms | 354ms |2.4s |9.7s |29s |133 ms |269 ms | 13s | 68s 
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Table 4. Results on sec_mult and S-Boxes for HPC, DOM and CMS. 


Ref. d 
sec_mult sec_present_sbox sec_aes_sbox 
0 1 2 3 4 5 1 2 3 4 5 1 2 3 4 |5 


HPC1 [20] | 28 ms | 30 ms | 32 ms | 35 ms | 39 ms | 42 ms | 63 ms | 72 ms | 84 ms | 98 ms | 117 ms | 104 ms | 254 ms | 1.8s | 13s | 67s 
HPC2 [20] | 23 ms | 25 ms | 26 ms | 28 ms | 31 ms | 33 ms | 57 ms | 66 ms | 75ms | 92 ms | 110 ms | 92ms |244 ms | 1.9s | 13s | 65s 
DOM [35] | 24 ms | 24 ms | 25 ms | 26 ms | 28 ms | 29 ms | 52 ms | 60 ms | 67ms|77ms|90ms |80ms | 223 ms |1.8s | 12s | 66s 
CMS [57] 24 ms 53 ms | 211 ms 


A highlight of our findings is that FISCHER reports that sec_mult from [8] 
and the S-boxes based on this version are incorrect when d = 5. After a careful 
analysis, we found that indeed it is incorrect for any d=1 mod 4 (i.e., 5, 9, 13, 
etc.). This is because [8] parallelizes the multiplication over the entire encodings 
(i.e., tuples of shares) while the parallelized computation depends on the value 
of d mod 4. When the reminder is 1, the error occurs. 


7.4 Evaluation for More Boolean Masking Schemes 


To demonstrate the applicability of FISCHER on a wider range of Boolean 
masking schemes, we further consider glitch-resistant Boolean masking schemes: 
HPC1, HPC2 [20], DOM [35] and CMS [57]. We implement the finite-field mul- 
tiplication sec_mult using those masking schemes, as well as masked versions 
of AES S-box and PRESENT S-box. We note that our implementation of DOM 
sec_mult is derived from [20], and we only implement the 2nd-order CMS 
sec_mult due to the difficulty of implementation. All other experimental set- 
tings are the same as in Sect. 7.3. 

The results are shown in Table 4. Our term rewriting system solely is able to 
efficiently prove the correctness of finite-field multiplication sec_mult, masked 
versions of AES S-box and PRESENT S-box using the glitch-resistant Boolean 
masking schemes HPC1, HPC2, DOM and CMS. The verification cost of those 
benchmarks is similar to that of benchmarks using the ISW scheme, demonstrat- 
ing the applicability of FISCHER for various Boolean masking schemes. 


Table 5. Results on sec_add, sec_add_modp and sec_a2b [17], where T.O. means time 
out (20 min). 


d|k 

sec_add sec_add_modp sec_a2b 

2 3 4 6 8 12 16 2 [3 E 6 jë J2 |2 3 4 6 8 [2 [16 
1/34ms | 38 ms | 42ms|51ms/61ms/83ms |109ms|97ms |248 ms |805ms |7.5s | 44s |623s |41ms |48ms |55ms |70ms |87ms |121 ms |156 ms 
2|35ms | 40ms | 45 ms | 55 ms | 65ms |91 ms |124ms | 111 ms |331 ms | 1.1s | 11s |67s |936s |58ms |74ms |93ms |134ms |199 ms | 523 ms | 1.5s 
3 | 36 ms | 42 ms | 47 ms | 58 ms | 71 ms | 100 ms | 139 ms | 127 ms | 417 ms | 1.5s | 15s |89s | T.O. |73ms |93ms |118 ms | 182 ms | 293 ms | 927 ms | 3.0s 
4 | 38 ms | 44 ms | 50 ms | 62 ms | 76 ms | 109 ms | 155 ms | 144 ms | 506 ms | 1.98s | 18s | 112s | T.O. |93ms |130ms | 190 ms | 676 ms | 3.3s 49s | 366s 
5 | 39 ms | 45 ms | 51 ms | 66 ms | 81 ms | 118 ms | 168 ms | 160 ms | 586 ms | 2.2s | 22s | 136s | T.O. | 109 ms | 159ms | 256ms | 1.1s  |6.5s (100s |746s 
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7.5 Evaluation for Arithmetic/Boolean Masking Conversions 


To demonstrate a wider applicability of FISCHER other than masked implemen- 
tations of symmetric cryptography, we further evaluate FISCHER on three key 
non-linear building blocks for bitsliced, masked implementations of lattice-based 
post-quantum key encapsulation mechanisms (KEMs [17]). Note that KEMs are 
a class of encryption techniques designed to secure symmetric cryptographic 
key material for transmission using asymmetric (public-key) cryptography. We 
implement the Boolean masked addition modulo 2* (sec_add), Boolean masked 
addition modulo p (sec_add_modp) and the arithmetic-to-Boolean masking con- 
version modulo 2* (sec_a2b) for various bit-width k and masking order d, where 
p is the largest prime number less than 2". Note that some bitwise operations 
(e.g., circular shift) are expressed by affine transformations, and the modulo 
addition is implemented by the simulation algorithm [17] in our implementa- 
tions. 

The results are reported in Table5. FISCHER is able to efficiently prove the 
correctness of these functions with various masking orders (d) and bit-width (k), 
using the term rewriting system solely. With the increase of the bit-width k (resp. 
masking order d), the verification cost increases more quickly for sec_add_modp 
(resp. sec_a2b) than for sec_add. This is because sec_add_modp with bit-width 
k invokes sec_add three times, two of which have the bit-width k + 1, and the 
number of calls to sec_add in sec_a2b increases with the masking order d though 
using the same bit-width as sec_a2b. These results demonstrate the applicability 
of FISCHER for asymmetric cryptography. 


8 Conclusion 


We have proposed a term rewriting based approach to proving functional equiva- 
lence between masked cryptographic programs and their original unmasked algo- 
rithms over GF(2"). Based on this approach, we have developed a tool FISCHER 
and carried out extensive experiments on various benchmarks. Our evaluation 
confirms the effectiveness, efficiency and applicability of our approach. 

For future work, it would be interesting to further investigate the theoretical 
properties of the term rewriting system. Moreover, we believe the term rewriting 
approach extended with more operations may have a greater potential in verify- 
ing more general cryptographic programs, e.g., those from the standard software 
library such as OpenSSL. 
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Abstract. In deductive verification and software model checking, deal- 
ing with certain specification language constructs can be problematic 
when the back-end solver is not sufficiently powerful or lacks the required 
theories. One way to deal with this is to transform, for verification pur- 
poses, the program to an equivalent one not using the problematic con- 
structs, and to reason about its correctness instead. In this paper, we 
propose instrumentation as a unifying verification paradigm that sub- 
sumes various existing ad-hoc approaches, has a clear formal correctness 
criterion, can be applied automatically, and can transfer back witnesses 
and counterexamples. We illustrate our approach on the automated ver- 
ification of programs that involve quantification and aggregation opera- 
tions over arrays, such as the maximum value or sum of the elements in 
a given segment of the array, which are known to be difficult to reason 
about automatically. We implement our approach in the MONOCERA 
tool, which is tailored to the verification of programs with aggregation, 
and evaluate it on example programs, including SV-COMP programs. 


1 Introduction 


Overview. Program specifications are often written in expressive, high-level 
languages: for instance, in temporal logic [14], in first-order logic with quan- 
tifiers [28], in separation logic [40], or in specification languages that provide 
extended quantifiers for computing the sum or maximum value of array ele- 
ments [7,33]. Specifications commonly also use a rich set of theories; for instance, 
specifications could be written using full Peano arithmetic, as opposed to bit- 
vectors or linear arithmetic used in the program. Rich specification languages 
make it possible to express intended program behaviour in a succinct form, and 
as a result reduce the likelihood of mistakes being introduced in specifications. 

There is a gap, however, between the languages used in specifications and 
the input languages of automatic verification tools. Software model checkers, in 
particular, usually require specifications to be expressed using program assertions 
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and Boolean program expressions, and do not directly support any of the more 
sophisticated language features mentioned. In fact, rich specification languages 
are challenging to handle in automatic verification, since satisfiability checks 
can become undecidable (i.e., it is no longer decidable whether assertion failures 
can occur on a program path), and techniques for inferring program invariants 
usually focus on simple specifications only. 

To bridge this gap, it is common practice to encode high-level specifica- 
tions in the low-level assertion languages understood by the tools. For instance, 
temporal properties can be translated to Biichi automata, and added to pro- 
grams using ghost variables and assertions [14]; quantified properties can be 
replaced with non-determinism, ghost variables, or loops [13,37]; sets used to 
specify the absence of data-races can be represented using non-deterministically 
initialized variables [18]. By adding ghost variables and bespoke ghost code to 
programs [22], many specifications can be made effectively checkable. 

The translation of specifications to assertions or ghost code is today largely 
designed, or even carried out, by hand. This is an error-prone process, and for 
complex specifications and programs it is very hard to ensure that the low-level 
encoding of a specification faithfully models the original high-level properties to 
be checked. Mistakes have been found even in industrial, very carefully developed 
specifications [39], and can result in assertions that are vacuously satisfied by 
any program. Naturally, the manual translation of specifications also tends to 
be an ad-hoc process that does not easily generalise to other specifications. 

This paper proposes the first general framework to automate the translation 
of rich program specifications to simpler program assertions, using a process 
called instrumentation. Our approach models the semantics of specific complex 
operations using program-independent instrumentation operators, consisting of 
(manually designed) rewriting rules that define how the evaluation of the opera- 
tor can be achieved using simpler program statements and ghost variables. The 
instrumentation approach is flexible enough to cover a wide range of different 
operators, including operators that are best handled by weaving their evaluation 
into the program to be analysed. While instrumentation operators are manually 
written, their application to programs can be performed in a fully automatic way 
by means of a search procedure. The soundness of an instrumentation operator is 
shown formally, once and for all, by providing an instrumentation invariant that 
ensures that the operator can never be used to show correctness of an incorrect 
program. 

Additional instrumentation operator definitions, correctness proofs, and 
detailed evaluation results can be found in the accompanying extended report [4]. 


Motivating Example. We illustrate our approach on the computation of tri- 
angular numbers sy = (N? + N)/2, see left-hand side of Fig. 1. For reasons of 
presentation, the program has been normalised by representing the square N*N 
using an auxiliary variable NN. While mathematically simple, verifying the post- 
condition s == (NN+N)/2 in the program turns out to be challenging even for 
state-of-the-art model checkers, as such tools are usually thrown off course by 
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1 // Triangular numbers 1 // Instrumented program 

2i = 0; /*A*/ s = 0; /*Bx*/ 2 i=0; s=0; x_sq=0; x_shad=0; 
3 assume (N>0); 3 assume (N>0); 

1 while(i < N) { 1 while(i < N) { 

5 // Begin-instrumentation 
é 6 assert (i == x_shad); 

7 7 X_sq = se ie) ar erst ae ale 
8 Bh ee ah ap ale AAA 8 i S sap abe 

9 9 x_shad = i; 

10 10 // End-instrumentation 
11 Se= Sa; 11 s S Gp sp alg 
12 } 12 } 
13 is // Begin-instrumentation 

14 14 assert(N == x_shad) ; 

is NN = N*N; /*Dx*/ is NN = x_sq; 
16 16 // End-instrumentation 

i7 assert(s == (NN+N)/2); 17 assert(s == (NN+N)/2); 


Fig. 1. Program computing triangular numbers, and its instrumented counterpart 


the non-linear term N*N. Computing the value of NN by adding a loop in line 16 
is not sufficient for most tools either, since the program in any case requires 
a non-linear invariant 0 <= i <= N && 2*s == i*i + i to be derived for the 
loop in lines 4-12. 

The insight needed to elegantly verify the program is that the value i*i can 
be tracked during the program execution using a ghost variable x_sq. For this, 
the program is instrumented to maintain the relationship x_sq == i*i: initially, 
i == x_sq == 0, and each time the value of i is modified, also the variable x_sq 
is updated accordingly. With the value x_sq == i*i available, both the loop 
invariant and the post-condition turn into formulas over linear arithmetic, and 
program verification becomes largely straightforward. The challenge, of course, 
is to discover this program transformation automatically, and to guarantee the 
soundness of the process. For the example, the transformed program is shown 
on the right-hand side of Fig. 1, and discussed in the next paragraphs. 

Our method splits the process of program instrumentation into two parts: 
(i) choosing an instrumentation operator, which is defined manually, designed to 
be program-independent, and induces a space of possible program transforma- 
tions; and (ii) carrying out an automatic application strategy to find, among the 
possible program transformations, one that enables verification of a program. 

An instrumentation operator for tracking squares is shown in Fig. 2, and con- 
sists of the declaration of two ghost variables (x_sq, x_shad) with initial value 0, 
respectively; four rules for rewriting program statements; and the instrumenta- 
tion invariant witnessing correctness of the operator. The rewrite rules use formal 
variables x,y, which can represent arbitrary variables in the program (i, N, NN). 
An application of the operator to a program will declare the ghost variables 
in the form of global variables, and then rewrite some chosen set of program 
statements using the provided rules. Since the statements to be rewritten can 
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— Qsquare (Instrumentation operator) 


— Gesquare (Ghost variables) 
x_sq,x_ shad: Int 


init(x_sq) = 0, init(x_shad) = 0 


—Resquare (Rewrite rules) 
r=a ~ © =a; x_sq=a"; x_shad=zr (R1) 


x=x+ta ~ assert (x == x_shad) ; (R2) 
X_sq=x_sq+2a*xz+0°;"2="+a; x_shad=2 


c=a*x ~ assert (x == x_shad); (R3) 
x_sq= a? *X_8q; T=Qa* x£; x_shad=zZ 


y=u*xx ~ assert (x ==x_shad); y= x_sq (R4) 


—Isquare (Instrumentation invariant) 
x_sq= x_ shad” 


Fig. 2. Definition of an instrumentation operator square for tracking squares 


be chosen arbitrarily, and since moreover multiple rewrite rules might apply to 
some statements, rewriting can result in many different variants of a program. 
In the example, we rewrite the assignments C, D of the left-hand side program 
using rewrite rules (R2) and (R4), respectively, resulting in the instrumented 
and correct program on the right-hand side. 

Instrumentation operators are designed to be sound, which means that 
rewriting a wrong selection of program statements might lead to an instru- 
mented program that cannot be verified, i.e., in which assertions might fail, 
but instrumentation can never turn an incorrect source program into a correct 
instrumented program. This opens up the possibility to systematically search 
for the right program instrumentation. We propose a counterexample-guided 
algorithm for this purpose, which starts from some arbitrarily chosen instru- 
mentation, checks whether the instrumented program can be verified, and oth- 
erwise attempts to fix the instrumentation using a refinement loop. As soon as 
a verifiable instrumented program has been found, the search can stop and the 
correctness of the original program has been shown. 

The concept of instrumentation invariants is essential for guaranteeing sound- 
ness of an operator. Instrumentation invariants are formulas that can (only) 
refer to the ghost variables introduced by an instrumentation operator, and are 
formulated in such a way that they hold in every reachable state of every instru- 
mented program. To maintain their invariants, instrumentation operators use 
shadow variables that duplicate the values of program variables. In the operator 
in Fig. 2, the purpose of the shadow variable x_shad is to reproduce the value 
of the program variable whose square is tracked (i). The rewriting rules intro- 
duce guards to detect incorrect instrumentation (the assertions in (R2), (R3), 
(R4)), which are particular cases in which some update of a relevant variable 
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was missed and not correctly instrumented. The use of shadow variables and 
guards make instrumentation operators very flexible; in our example, note that 
instrumentation tracks the square of the value of i during the loop, but is also 
used later to simplify the expression N*N. This is possible because of the instru- 
mentation invariant and because i == N holds after termination of the loop, 
which is verified through the assertion introduced in line 14. 


Contributions and Outline. The operator shown in Fig. 2 is simple, and does 
not apply to all programs, but it can easily be generalised to other arithmetic 
operators and program statements. The framework presented in this paper pro- 
vides the foundation for developing a (extendable) library of formally verified 
instrumentation operators. In the scope of this paper, we focus on two speci- 
fication constructs that have been identified as particularly challenging in the 
literature: existential and universal quantifiers over arrays, and aggregation (or 
extended quantifiers), which includes computing the sum or maximum value 
of elements in an array. Our experiments on benchmarks taken from the SV- 
COMP [8] show that even relatively simple instrumentation operators can sig- 
nificantly extend the capabilities of a software model checker, and often make 
the automatic verification of otherwise hard specifications easy. 

The contributions of the paper are: (i) a general framework for program 
instrumentation, which defines a space of program transformations that work 
by rewriting individual statements (Sect. 2); (ii) an application strategy search 
algorithm in this space, for a given program (Sect. 3); (iii) two instantiations of 
the framework—one for instrumentation operators to handle specifications with 
quantifiers (Sect.4.1), and one for extended quantifiers (Sect. 4.2); (iv) machine- 
checked proofs of the correctness of the instrumentation operators for quanti- 
fiers V and the extended quantifier \max; (v) a new verification tool, MONO- 
CERA, that is tailored to the verification of programs with aggregation; and (vi) 
an evaluation of our method and tool on a set of examples, including such from 
SV-COMP [8] (Sect. 5). 


2 Instrumentation Framework 


The next two sections formally introduce the instrumentation framework. Later, 
we instantiate the framework for quantification and aggregation over arrays. We 
split the instrumentation process into two parts: 


1. An instrumentation operator that defines how to rewrite program statements 
with the purpose of eliminating language constructs that are difficult to rea- 
son about automatically, but leaves the choice of which occurrences of these 
statements to rewrite to the second part (this section). 

2. An application strategy for the instrumentation operator, which can be imple- 
mented using heuristics or systematic search, among others. The strategy is 
responsible for selecting the right (if any) program instrumentation from the 
many possible ones, Sect. 3 is dedicated to the second part. 
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Table 1. Syntax of the core language. 


(Type) := Int | Bool | Array (Type) 
(Expr) :=  (DecimalNumber) | true | false | ( Variable) 
| (Expr) == (Expr) | (Expr) <= (Expr) | ! (Expr) | (Expr) tè (Expr) 
| (Bapr) |1 (Bapr) | (Expr) + (Expr) | (Expr) + (Expr) 
| select ((Ezpr),(Expr)) | store((Ezpr) , (Expr) ,(Expr)) 
(Prog) ::= skip | (Variable) = (Expr) | (Prog); (Prog) | while ((Ezpr)) (Prog) 
| 


assert ((Expr)) | assume((Ezpr)) | if ((Expr)) (Prog) else (Prog) 


Even though instrumentation operators are non-deterministic, we shall guaran- 
tee their soundness: if the original program has a failing assertion, so will any 
instrumented program, regardless of the chosen application strategy; that is, 
instrumentation of an incorrect program will never yield a correct program. 

We shall also guarantee a weak form of completeness, to the effect that if an 
assertion that has not been added to the program by the instrumentation fails 
in the instrumented program, then it will also fail in the original program. As a 
result, any counterexample (for such an assertion) produced when verifying the 
instrumented program can be transformed into a counterexample for the original 
program. 


2.1 The Core Language 


While our implementation works on programs represented as constrained Horn 
clauses [12], i.e., is language-agnostic, for readability purposes we present our 
approach in the setting of an imperative core programming language with data- 
types for unbounded integers, Booleans, and arrays, and assert and assume 
statements. The language is deliberately kept simple, but is still close to stan- 
dard C. The main exception is the semantics of arrays: they are defined here 
to be functional and therefore represent a value type. Arrays have integers as 
index type and are unbounded, and their signature and semantics are otherwise 
borrowed from the SMT-LIB theory of extensional arrays [6]: 


— Reading the value of an array a at index i: select(a, i); 
— Updating an array a at index i with a new value x: store(a, i, x). 


The complete syntax of the core language is given in Table 1. Programs are 
written using a vocabulary ¥ of typed program variables; the typing rules of the 
language are given in [4]. As syntactic sugar, we sometimes write a[i] instead 
of select(a, i), and a[i] = x instead of a = store(a, i, x). 

We denote by Do the domain of a program type o. The domain of an array 
type Array ø is the set of functions f : Z > Dg. 


Semantics. We assume the Flanagan-Saxe extended execution model of pro- 
grams with assume and assert statements (see, e.g., [23]), in which executing 
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an assert statement with an argument that evaluates to false fails, i.e., termi- 
nates abnormally. An assume statement with an argument that evaluates to false 
has the same semantics as a non-terminating loop. Partial correctness proper- 
ties of programs are expressed using Hoare triples {Pre} P {Post}, which state 
that an execution of P, starting in a state satisfying Pre, never fails, and may 
only terminate in states that satisfy Post. As usual, a program P is considered 
(partially) correct if the Hoare triple {true} P {true} holds. 

The evaluation of program expressions is modelled using a function [|], that 
maps program expressions t of type ø to their value |t]; € Do in the state s. 


2.2 Instrumentation Operators 


An instrumentation operator defines schemes to rewrite programs while preserv- 
ing the meaning of the existing program assertions. Without loss of generality, 
we restrict program rewriting to assignment statements. Instrumentation can 
introduce ghost state by adding arbitrary fresh variables to the program. The 
main part of an instrumentation consists of rewrite rules, which are schematic 
rules r = t ~ s, where the meta-variable r ranges over program variables, t is 
an expression that can contain further meta-variables, and s is a schematic pro- 
gram in which the meta-variables from r = t might occur. Any assignment that 
matches r = t can be rewritten to s. 


Definition 1 (Instrumentation Operator). An instrumentation operator 
is a tuple NQ = (G, R, I), where: 


(i) G = ((x1, init1),..., (Xk, initk)) is a tuple of pairs of ghost variables and 
their initial values; 
(ii) R is a set of rewrite rules r =t ~> s, where s is a program operating on the 


ghost variables x1,...,X~ (and containing meta-variables from r = t); 
(iii) I is a formula over the ghost variables x1,...,x,, called the instrumentation 
invariant. 


The rewrite rules R and the invariant I must adhere to the following constraints: 


1. The instrumentation invariant I is satisfied by the initial ghost values, i.e., 
it holds in the state {x1 +> init1,..., Xk + init,}. 
2. For all rewrites r =t ~ s E€ R the following hold: 
(a) s terminates (normally or abnormally) for pre-states satisfying I, assum- 
ing that all meta-variables are ordinary program variables. 
(b) s does not assign to variables other than r or the ghost vari- 
ables x1,...,Xk. 
(c) s preserves the instrumentation invariant: {I} s’ {I}, where s' is s with 
every assert (e) statement replaced by an assume(e) statement. 
(d) s preserves the semantics of the assignment r = t: the Hoare triple 
{I} z=t; s {z= r}, where z is a fresh variable, holds. 
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The conditions imposed in the definition ensure that all instrumentations 
are correct, in the sense that they are sound and weakly complete, as we show 
below. In particular, the instrumentation invariant guarantees that the rewrites 
of program statements are semantics-preserving w.r.t. the original program, and 
thus, the execution of any assert statement of the original program has the 
same effect before and after instrumentation. Observe that the conditions can 
themselves be deductively verified to hold for each concrete instrumentation 
operator, and that this check is independent of the programs to be instrumented, 
so that an instrumentation operator can be proven correct once and for all. 

An instrumentation operator 2 does itself not define which occurrences of 
program statements are to be rewritten, but only how they are rewritten. Given 
a program P and the operator 2, an instrumented program P’ is derived by 
carrying out the following two steps: (i) variables x,,...,x;, and the assignments 
xı = initi; ...; Xk = init, are added at the beginning of the program, and 
(ii) some of the assignments in P, to which a rewriting rule r = t ~ s in 22 is 
applicable, are replaced by s, substituting meta-variables with the actual terms 
occurring in the assignment. We denote by (P) the set of all instrumented 
programs P’ that can be derived in this way. An example of an instrumentation 
operator and its application was shown Fig. 1 and Fig. 2. 


2.3 Instrumentation Correctness 


Verification of an instrumented program produces one of two possible results: a 
witness if verification is successful, or a counterexample otherwise. A witness con- 
sists of the inductive invariants needed to verify the program, and is presented in 
the context of the programming language: it is translated back from the back-end 
theory used by the verification tool, and is a formula over the program variables 
and the ghost variables added during instrumentation. A counterexample is an 
execution trace leading to a failing assertion. 


Definition 2 (Soundness). An instrumentation operator N is called sound 
if for every program P and instrumented program P’ € Q(P), whenever there 
is an execution of P where some assert statement fails, then there also is an 
execution of P’ where some assert statement fails. 


Equivalently, existence of a witness for an instrumented program entails exis- 
tence of a witness for the original program, in the form of a set of inductive 
invariants solely over the program variables. Notably, because of the semantics- 
preserving nature of the rewrites under the instrumentation invariant, a witness 
for the original program can be derived from one for the instrumented program. 
One such back-translation is to add the instrumentation invariant as a conjunct 
to the original witness, and to existentially quantify over the ghost variables. 


Example. To illustrate the back-translation, we return to the instrumentation 
operator from Fig. 2 and the example program from Fig. 1. The witness produced 
by our verification tool in this case is the formula: 


i=x_shadAx_sq+x_shad=2sAN[S>iAN>1A2>iAiz=0 
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After conjoining the instrumentation invariant x_ sq = x_ shad? and existen- 
tially quantifying over the involved ghost variables, we obtain an inductive invari- 
ant that is sufficient to verify the original program: 


Ilza; Tshad- (i = Tshad ^ Tsq + Tshad = 28s ^ 
N> iANÈ1A2s >ii OA Tq = Thad) 


Definition 3 (Weak Completeness). The operator Q is called weakly com- 
plete if for every program P and instrumented program P’ € Q(P), whenever an 
assert statement that has not been added to the program by the instrumentation 
fails in the instrumented program P', then it also fails in the original program P. 


Similarly to the back-translation of invariants, when verification fails, counterex- 
amples for assertions of the original program, found during verification of the 
instrumented program, can be translated back to counterexamples for the orig- 
inal program. We thus obtain the following result. 


Theorem 1 (Soundness and weak completeness). Every instrumentation 
operator (2 is sound and weakly complete. 


Proof. Let Q = (G,R,1I) be an instrumentation operator. Since J is a formula 
over ghost variables only, which holds initially and is preserved by all rewrites, 
I is an invariant of the fully instrumented program. This entails that rewrites of 
assignments are semantics-preserving. Furthermore, since instrumentation code 
only assigns to ghost variables or to r (i.e., the left-hand side of the original state- 
ment), program variables have the same valuation in the instrumented program 
as in the original one. Furthermore, since all rewrites are terminating under J, 
the instrumented program will terminate if and only if the original program does. 

In the case when verification succeeds, and a witness is produced, weak com- 
pleteness follows vacuously. A witness consists of the inductive invariants suffi- 
cient to verify the instrumented program. Thus, they are also sufficient to verify 
the assertions existing in the original program, since assertions are not rewrit- 
ten and all program variables have the same valuation in the original and the 
instrumented programs. Since a witness for the instrumented program can be 
back-translated to a witness for the original program, any failing assertion in the 
original program must also fail after instrumentation, and 2 is therefore sound. 

In the case when verification fails, soundness follows vacuously, and if the 
failing assertion was added during instrumentation, also weak completeness fol- 
lows. If the assertion existed in the original program, since such assertions are 
not rewritten, and since program variables have the same valuation in the instru- 
mented program as in the original program, then any counterexample for the 
instrumented program is also a counterexample for the original program, when 
projected onto the program variables. 
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Input: Program P; statements S; instrumentation space R; 
oracle [sCorrect. 
Result: Instrumentation r € R with IsCorrect(P,); Incorrect; or 


Inconclusive. 
1 begin 
2 Cand — R; 
3 while Cand 4 Ý do 
4 pick r € Cand; 
5 if IsCorrect(P,) then 
6 return r; 
7 else 
8 cex <— counterexample path for P,.; 
9 if failing assertion in cex also exists in P then 
/* cez is also a counterexample for P */ 
10 return Incorrect; 
11 else 
/* instrumentation on cer may have been incorrect 
*/ 
12 C’ — {p E C | ins,(p) occurs on cer}; 
13 Cand — Cand \ {r" € Cand | r(s) = r’(s) for all p € C’}; 
14 end 
15 end 
16 end 
17 return [nconclusive; 
18 end 


Algorithm 1: Counterexample-guided instrumentation search 


3 Instrumentation Application Strategies 


We will now define a counterexample-guided search procedure to discover appli- 
cations of instrumentation operators that make it possible to verify a program. 

For our algorithm, we assume that we are given an oracle IsCorrect that 
is able to check the correctness of programs after instrumentation. Such an 
oracle could be approximated, for instance, using a software model checker. 
The oracle is free to ignore the complex functions we are trying to eliminate 
by instrumentation; for instance, in Fig. 1, the oracle can over-approximate the 
term N*N by assuming that it can have any value. We further assume that C 
is the set of control points of a program P corresponding to the statements to 
which a given set of instrumentation operators can be applied. For each control 
point p € C, let Q(p) be the set of rewrite rules applicable to the statement 
at p, including also a distinguished value L that expresses that p is not mod- 
ified. For the program in Fig.1, for instance, the choices could be defined by 
Q(A) = Q(B) = {(R1), L}, Q(C) = {(R2), L}, and QD) = {(RA), L}, refer- 
ring to the rules in Fig.2. Any function r : C > Upec Q(p) with r(p) € Q(p) 
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Table 2. Extension of the core language with quantified expressions. 


(Expr) := (AC Variable) ,( Variable)) .(Expr)) ((Expr), (Expr)) | 
forall ((Ezpr) , (Expr), (Expr) ,A(( Variable), ( Variable)) .(Ezpr)) | 
exists (( Expr) , (Expr) , (Expr) ,A(( Variable), ( Variable)) .(Ezpr)) 


will then define one possible program instrumentation. We will denote the set 
of well-typed functions C — Unec Q(p) by R, and the program obtained by 
rewriting P according to r € R by P,. We further denote the control point in 
P, corresponding to some p € C in P by ins,(p). 

Algorithm 1 presents our algorithm to search for instrumentations that are 
sufficient to verify a program P. The algorithm maintains a set Cand C R of 
remaining ways to instrument P, and in each loop considers one of the remaining 
elements r € Cand (line 4). If the oracle manages to verify P, in line 5, due to 
soundness of instrumentation the correctness of P has been shown (line 6); if 
P, is incorrect, there has to be a counterexample ending with a failing assertion 
(line 8). There are two possible causes of assertion failures: if the failing assertion 
in P, already existed in P, then due to the weak completeness of instrumentation 
also P has to be incorrect (line 10). Otherwise, the program instrumentation 
has to be refined, and for this from Cand we remove all instrumentations r’ that 
agree with r regarding the instrumentation of the statements occurring in the 
counterexample (line 13). 

Since R is finite, and at least one element of Cand is eliminated in each 
iteration, the refinement loop terminates. The set Cand can be exponentially 
big, however, and therefore should be represented symbolically (using BDDs, or 
using an SMT solver managing the set of blocking constraints from line 13). 

We can observe soundness and completeness of the algorithm w.r.t. the con- 
sidered instrumentation operators (proof in [4]): 


Lemma 1 (Correctness of Algorithm 1). If Algorithm 1 returns an instru- 
mentation r € R, then P, and P are correct. If Algorithm 1 returns Incorrect, 
then P is incorrect. If there is r € R such that P, is correct, then Algorithm 1 
will return r' such that P, is correct. 


4 Instrumentation Operators for Arrays 


4.1 Instrumentation Operators for Quantification over Arrays 


To handle quantifiers in a programming setting, we extend the language defined 
in Table 1 by adding quantified expressions over arrays, as shown in Table 2. As 
seen, we also extend the language with a lambda expression over two variables. 
The rationale for this is that many quantified properties can be expressed as a 
binary predicate with the first argument corresponding to the value of an element 
and the second to the index. This allows us to express properties over both the 
value of an element and its index. For example, we can express that each element 
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i Int N = nondet; 

assume(N > 0); 

3 Array Int a = const(0, N); 
Pago Gh & 8 

s while(i < N) { 


Y 


6 a= storela, i 1); 

7 1 oS ai ar alg 

s } 

9 Bool b = forall(a, 0, N, ACi,x).(x == 1i)); 
10 assert (b); 


Fig. 3. Example of program to be verified using a quantified assert statement. 


should be equal to its index, as is done in the example program in Fig. 3. In the 
program, each element in the array is assigned the value corresponding to its 
index, after which it is asserted that this property indeed holds. 

Using P(xo,iọ) as shorthand for (A(x,i).P)(xo,io), the new expressions 
can be defined formally as: 


[forall (a, 1, u, A(x,i).P)], = Vi € [1,u). [P (a[i],i)]s 
ļexists(a, 1, u, A(x,i).P)], = Ji € [1,u). [P (a[i],i)]s 


Note that the types of x and a must be compatible and P be a Boolean-valued 
expression. 

To handle programs such as the one in Fig. 3, we turn to the instrumentation 
framework outlined in Sect. 2.2, which we use here to define an instrumentation 
operator for universal quantification. The general idea is to instrument programs 
with a ghost variable, tracking if some predicate holds for all elements in an 
interval of the array, with shadow variables representing the tracked array, and 
the bounds of the interval. Naturally, an instrumentation operator for existential 
quantification can be defined in a similar fashion. For simplicity, we shall assume 
a normal form of programs, into which every program can be rewritten by intro- 
ducing additional variables. In the normal form, store, select and forall can 
only occur in simple assignment statements. For example, stores are restricted 
to occur in statements of the form: a? = store(a, i, x). 

Over such normalised programs, and for a universally quantified expres- 
sion forall(a, 1, u, A(x,i)(P)), we define the instrumentation operator 
IN P = (Gy,p, Rv,p, Iy, p) as shown in Fig. 4 over four ghost variables. The array 
over which quantification occurs is tracked by qu_ar and the variables qu_1o, 
qu_ hi represent the bounds of the currently tracked interval. The result of the 
quantified expression is tracked by qu_P, whose value is true iff P holds for 
all elements in a in the interval [qu_10,qu_hi). The rewrite rules for stores, 
selects and assignments of universally quantified expressions are then defined 
as follows. For stores, the first if-branch resets the tracking to the one element 
interval [i,i + 1) when accessing elements far outside of the currently tracked 
interval, or if we are tracking the empty interval (as is the case at initialisa- 
tion). If an access occurs immediately adjacent to the currently tracked interval 


Automatic Program Instrumentation for Automatic Verification 293 


__ Ny p (Instrumentation operator) 


— Gy p (Ghost variables) 
qu_ar: Array Int, qu_lo, qu_hi: Int, qu_P: Bool 


init(qu_ar) = [], init(qu_lo) = 0, init(qu_hi) = 0 
init(qu_P) = true 


— Ry p (Rewrite rules) 
a? = store(a, i, x); ~ 


1 ae estore Cari 

2 if (gü lo == gü bhi || i < qu_lo - 1 || i > qu_hi | 

3 (P(x, i) && !qu_P && qu_lo <= i && i < qu_hi)) { 

4 qu_lo = i; // Reset, because either: 

5 qu_hi = i + 1; // - tracking empty interval 

6 gu P = Pix; ais // - storing far outside interval 
7 } else { // - possibly overwriting sole false 
8 assert (qu_ar == a); 

9 qu_P = qu_P && P(x, i); 

10 if (gu lo - 1 == i) { 

11 qu_lo = i; // Decrement lower bound by 1 

12 } else if (qu_hi == i) { 

13 gü- nhi = i F als // Increment upper bound by 1 

14 } 

15 } 

16 gU eria vars 


x = select(a, i); ~ similar to store 


b = forall(a, 1, u,Ax.P); ~ 


1 SECs wee 

2 b = true; 

3 } else { 

4 if Cqu_P) £ 

5 assert(qu_ar == a && 1 >= qu_lo && u <= qu_hi); 
6 } else { 

assert(qu_ar == a && 1 <= qu_lo && u >= qu_hi); 


10 } 


—Iyp (Instrumentation Invariant) 
qu_lo = qu_hi V 
(qu_lo<qu_hiAqu_ P= forall(qu_ar,qu_lo,qu_hi,A(x, i).P)) 


Fig. 4. Definition of an instrumentation operator for universal quantification 


(e.g., if i = qu_lo — 1), then that element is added to the tracked interval, and 
the value of qu_P is updated to also account for the value of P at index i. If 
instead the access is within the tracked interval, then we either reset the interval 
(if qu_P is false) or keep the interval unchanged (if qu_P is true). Rewrites 
of selects are similar to stores, except tracking does not need to be reset when 
reading inside the tracked interval. For rewrites of quantified expressions, if the 
quantified interval is empty, b is assigned true. Otherwise, assertions check that 
the tracked interval matches the quantified interval before assigning t to qu_P. 
If qu_P is true, then it is sufficient that quantification occurs over a sub-interval 
of the tracked interval, and vice versa if qu_P is false. 
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The result of applying y,p to the program in Fig.3 is shown in [4]. As 
exhibited by the experiments in Sect. 5, the resulting program is in many cases 
easier to verify by state-of-the-art verification tools. Note that the instrumenta- 
tion operator defined is only one possibility among many. For example, one could 
track several ranges simultaneously over the array in question, or also track the 
index of some element in the array over which P holds, or make different choices 
on stores outside of the tracked interval. 

The following lemma establishes correctness of the instrumentation operator. 
The proof can be found in [4]. 


Lemma 2 (Correctness of y, p). Ry p is an instrumentation operator, i.e., 
it adheres to the constraints imposed in Definition 1. 


4.2 Instrumentation Operators for Aggregation over Arrays 


We now turn to the verification of safety properties with aggregation. As exam- 
ples of aggregation, we consider in particular the operators \sum and \max, cal- 
culating the sum and maximum value of an array, respectively. Aggregation 
is supported in the form of extended quantifiers in the specification languages 
JML [33] and ACSL [7], and is frequently needed for the specification of func- 
tional correctness properties. Although commonly used, most verification tools 
do not support aggregation, so that properties involving aggregation have to 
be manually rewritten using standard quantifiers, pure recursive functions, or 
ghost code involving loops. This reduction step is error-prone, and represents an 
additional complication for automatic verification approaches, but can be han- 
dled elegantly using the instrumentation framework. For generality, we formalise 
aggregation over arrays with the help of monoid homomorphisms. 


Definition 4 (Monoid). A monoid is a structure (M,0,e) consisting of a non- 
empty set M, a binary associative operation o on M, and a neutral element e € 
M. A monoid is commutative if o is commutative. A monoid is cancellative if 
roy=xoz implies y= z, and yo x = zo x implies y = z, for all x,y,z E€ M. 


For aggregation, we model finite intervals of arrays using the cancellative 
monoid (D*,-, €) of finite sequences over some data domain D. The concatenation 
operator - is non-commutative. 


Definition 5 (Monoid Homomorphism). A monoid homomorphism is a 
function h : Mı —> Mə between monoids (Mı,01,€1) and (M2,02,e2) with the 
properties h(x o1 y) = h(x) o2 h(y) and h(e1) = e2. 


Ordinary quantifiers can be modelled as homormorphisms D* — B, so that 
the instrumentation in this section strictly generalizes Sect. 4.1. A second clas- 
sical example is the computation of the maximum (similarly, minimum) value 
in a sequence. For the domain of integers, the natural monoid to use is the 
algebra (Z_..,max,—oo) of integers extended with —oo,' and the homomor- 
phism hax is generated by mapping singleton sequences (n) to the value n. A 


1 For machine integers, —oo could be replaced with INT_MIN. 
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third example is the computation of the element sum of an integer sequence, 
corresponding to the monoid (Z,+,0) and the homomorphism hsum. Similarly, 
the number of occurrences of some element can be computed. The considered 
monoid in the last two cases of aggregation is even cancellative. 


Programming Language with Aggregation. We extend our core program- 
ming language with expressions aggregate, „( (Expr) , (Expr) ,(Expr)), and use 
monoid homomorphisms to formalise them. Recall that we denote by Dg the 
domain of a program type co. 


Definition 6. Let Array o be an array type, oy a program type, M a commuta- 
tive monoid that is a subset of Dou, andh: D> — M a monoid homomorphism. 
Let furthermore ar be an expression of type Array o, and l and u integer expres- 
sions. Then, aggregatey, ;, (ar,l,u) is an expression of type om, with semantics 


defined by: 


[aggregate,,,,(ar,l,uJ}s = h((Lar]s(LJs), Larls (Es + 1),---, Darts (uls — 1))) 


Intuitively, the expression aggregate m „(ar ,/,u) denotes the result of applying 
the homomorphism h to the slice ar|l .. u — 1] of the array ar. As a convention, 
in case u < l we assume that the result of aggregate is h(()). As with array 
accesses, we assume also that aggregate only occurs in normalised statements 
of the form t = aggregate), , (ar,/,wu). 

In our examples, we use derived operations as found in ACSL: \max as short- 
hand notation for aggregate;z_ 2 and \sum as short-hand nota- 
tion for aggregate(z 4 0) heim: 


œ; max, —00),hmax 


An Instrumentation Operator for Maximum. For \max, an operator 
Qmax = (Gmar, Rmar, Imas) can be defined similarly to the operator Qy,p from 
Sect. 4.1, in that the maximum value in a particular interval of the array is 
tracked. One key difference is that an extra ghost variable ag_max_idx is added 
to track an array index where the maximum value of the array interval is stored, 
in order to not have to reset tracking on every store inside of the tracked interval. 
A complete definition is proposed in [4]. 


An Instrumentation Operator for Sum. Cancellative aggregation is aggre- 
gation based on a cancellative monoid. Cancellative aggregation makes it possi- 
ble to track aggregate values faithfully even when storing inside of the tracked 
interval, unlike \max and universal quantification. An example of a cancellative 
operator is the aggregate \sum . 

The instrumentation operator sum = (Gsum, Rsum, Isum) is defined in 
Fig.5. The instrumentation code tracks the sum of values in the interval, and 


? With a slight abuse of the framework, we assume that Z_. is represented by the pro- 
gram type Int, mapping —oo to some fixed integer number. More elegant solutions 
are not difficult to devise, but add unnecessary complexity. 
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— 2sum (Instrumentation operator) 


— Gsum (Ghost variables) 
ag lo, ag hi, ag sum: Int, ag ar: Array Int 


init(ag_lo) = init(ag hi) = init(ag_ sum) = 0, init(ag_ ar) = |] 


—Rsum (Rewrite rules) 
a’? = store(a, i, x) ~ 


1 á = storela, l, X); 
2 at (ag lo == ag bi Iai < ag lo = ili S ag hi { 
3 ag_lo = i; // Reset, because either: 
4 ag_hi = i che i) // - tracking empty interval 
5 ag_sum = x; // - storing far outside interval 
6 } else { 
7 assert (ag_ar == a); 
8 if (ag lo <= i && i < ag hi) { 
9 // Subtract previous value from sum 
10 ag_sum = ag_sum - select(ag_ar, i); 
ll } else if (ag_lo - 1 == i) { 
12 ag_lo = i; // Decrease lower bound by 1 
13 } else if (ag hi == i) { 
14 ae ahi endear // Increase upper bound by 1 
15 } 
16 ag_sum = ag_sum + x; // Add new value to sum 
17 È 
18 ag -ar = al; 
x = select(a, i) ~> code similar to rewrites of store 
r = \sum(a, 1, u) ~~ 
1 PRC a 
2 t = 10); 
3 } else { 
4 assert (ag_ar == a && 1 == ag_lo && u == ag_hi); 
5 t = ag_sum; 
6 } 


Lenny (Instrumentation invariant) 
ag lo=ag hiVag sum=sum(ag_ar,ag_ lo,ag_hi) 


Fig. 5. Definition of an instrumentation operator Qsum for Sum 


when increasing the bounds of the tracked interval, the new values are simply 
added to the tracked sum. Since \sum is cancellative, when storing inside of 
the tracked interval, the previous value at the index being written to is first 
subtracted from the sum, before adding the new value, ensuring that the correct 
aggregate value is computed. The following correctness result is proved in [4]. 


Lemma 3. (Correctness of Num). Qsum is an instrumentation operator, 
i.e., it adheres to the constraints imposed in Definition 1. 


Deductive Verification of Instrumentation Operators. As stated in 
Sect.2.2, instrumentation operators may be verified independently of the 
programs to be instrumented. The operators described in this paper, i.e. 
square, universal quantification, maximum, and sum, have been verified in the 
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verification tool Frama-C [15]. The verified instrumentations are adaptations for 
the C language semantics and execution model. More specifically, the adapted 
operators assume C native arrays, rather than functional ones. 


5 Evaluation 


5.1 Implementation 


To evaluate our instrumentation framework, we have implemented the instru- 
mentation operators for quantifiers and aggregation over arrays. The implemen- 
tation is done over constrained Horn clauses (CHCs), by adding the rewrite rules 
defined in Sect.4 to ELDARICA [30], an open-source solver for CHCs. We also 
implemented the automatic application of the instrumentation operators, largely 
following Algorithm 1 but with a few minor changes due to the CHC setting. 
The CHC setting makes our implementation available to various CHC-based ver- 
ification tools, for instance JAYHORN (Java) [32], KORN (C) [19], RUSTHORN 
(Rust) [36], SEAHORN (C/LLVM) [26] and TRICERA (C) [20]. 

In order to evaluate our approach at the level of C programs, we extended 
TRICERA, an open-source assertion-based model checker that translates C pro- 
grams into a set of CHCs and relies on ELDARICA as back-end solver. TRICERA 
is extended to parse quantifiers and aggregation operators in its input C pro- 
grams and to encode them as part of the translation into CHCs. We call the 
resulting toolchain MONOCERA. An artefact that includes MONOCERA and the 
benchmarks is available online [5]. 

To handle complicated access patterns, for instance a program processing 
an array from the beginning and end at the same time, the implementation 
can apply multiple instrumentation operators simultaneously; the number of 
operators is incremented when Algorithm 1 returns Inconclusive. 


5.2 Experiments and Comparisons 


To assess our implementation, we assembled a test suite and carried out experi- 
ments comparing MONOCERA with the state-of-the-art C model checkers CPA- 
CHECKER 2.1.1 [11], SEAHORN 10.0.0 [26] and TRICERA 0.2. It should be noted 
that deductive verification frameworks, such as Dafny and Frama-C, can handle, 
for example, the program in Fig.3 if they are provided with a manually written 
loop invariant; however, since MONOCERA relies on automatic techniques for 
invariant inference, we only benchmark against tools using similar automatic 
techniques. We also excluded VERIABS [1], since its licence does not permit its 
use for scientific evaluation. 

The tools were set up, as far as possible, with equivalent configurations; for 
instance, to use the SMT-LIB theory of arrays [6] in order to model C arrays, and 
a mathematical (as opposed to machine) semantics of integers. CPACHECKER 
was configured to use k-induction [10], which was the only configuration that 
worked in our tests using mathematical integers. SEAHORN was run using the 
default settings. All tests were run on a Linux machine with AMD Opteron 2220 
SE @ 2.8 GHz and 6 GB RAM with a timeout of 300s. 
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Table 3. Results for MONOCERA (Mono), TRICERA (TRI), SEAHORN (SEA), and 
CPACHECKER (CPA). For MONOCERA, also statistics are given for verification time 
(s), size of the instrumentation search space, and search iterations. 


Verification results Ver. time Inst. space Inst. steps 
##Tests Mono Tri SEA CPA Min Max Avg Max Avg Max Avg 
min 17 9 2 2 2 22 59 33 27 11 55 24 
max 12 8 2 3 3 21 285 76 108 21 96 30 
sum 26 16 3 3 3 26 245 78 2916 188 284 36 
forall 96 30 1 0 2 14 236 91 59049 2446 334 59 


Test Suite. The comparison includes a set of programs calculating properties 
related to the quantification and aggregation properties over arrays. The bench- 
marks and verification results are summarised in Table 3. The benchmark suite 
contains programs ranging between 16 to 117 LOC and is comprised of two parts: 
(i) 117 programs taken from the SV-COMP repository [9], and (ii) 26 programs 
crafted by the authors (min: 6, max: 8, sum: 9, forall: 3). 

To construct the SV-COMP benchmark set for MONOCERA we gathered 
all test files from the directories prefixed with array or loop, and singled out 
programs containing some assert statement that could be rewritten using a quan- 
tifier or an aggregation operator over a single array. For example, loops 


for (int i = 0; i < N; i++) assert(a[i] <= 0); 


can be rewritten using forall or max operators. We created a benchmark for 
each possible rewriting; for instance, in the case of max, by rewriting the loop 
into assert (\max(a, 0, N) <= 0) . The original benchmarks were used for the 
evaluation of the other tools, none of which supported (extended) quantifiers. 

In (ii), we crafted 9 programs that make use of aggregation or quantifiers, 
and derived further benchmarks by considering different array sizes (10, 100 and 
unbounded size); one combination (unbounded array inside a struct) had to be 
excluded, as it is not valid C. In order to evaluate other tools on our crafted 
benchmarks, we reversed the process described for the SV-COMP benchmarks 
and translated the operators into corresponding loop constructs. 


Results. In Table 3, we present the number of verified programs per instrumenta- 
tion operator for each tool, as well as further statistics for MONOCERA regarding 
verification times and instrumentation search space. The “Inst. space” column 
indicates the size of the instrumentation search space (i.e., number of instrumen- 
tations producible by applying the non-deterministic instrumentation operator). 
‘Inst. steps” column indicates the number of attempted instrumentations, i.e., 
number of iterations in the while-loop in Algorithm 1. In our implementation, 
the check in Algorithm 1 line 5 can time out and cause the check to be repeated 
at a later time with a greater timeout, which can lead to more iterations than 
the size of the search space. In [4], we list results per benchmark for each tool. 
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For the SV-COMP benchmarks, CPACHECKER managed to verify 1 program, 
while SEAHORN and TRICERA could not verify any programs. MONOCERA ver- 
ified in total 42 programs from SV-COMP. Regarding the crafted benchmarks, 
several tools could verify the examples with array size 10. However, when the 
array size was 100 or unbounded, only MONOCERA succeeded. 


6 Related Work 


It is common practice, in both model checking and deductive verification, to 
translate high-level specifications to low-level specifications prior to verification 
(e.g., [13,14,18,37]). Such translations often make use of ghost variables and 
ghost code, although relatively little systematic research has been done on the 
required properties of ghost code [22]. The addition of ghost variables to a pro- 
gram for tracking the value of complex expressions also has similarities with the 
concept of term abstraction in Horn solving [3]. To the best of our knowledge, 
we are presenting the first general framework for automatic program instrumen- 
tation. 

A lot of research in software model checking considered the handling of stan- 
dard quantifiers V,4 over arrays. In the setting of constrained Horn clauses, 
properties with universal quantifiers can sometimes be reduced to quantifier-free 
reasoning over non-linear Horn clauses [13,37]. Our approach follows the same 
philosophy of applying an up-front program transformation, but in a more gen- 
eral setting. Various direct approaches to infer quantified array invariants have 
been proposed as well: e.g., by extending the IC3 algorithm [27], syntax-guided 
synthesis [21], learning [24], by solving recurrence equations [29], backward reach- 
ability [3], or superposition [25]. To the best of our knowledge, such methods have 
not been extended to aggregation. 

Deductive verification tools usually have rich support for quantified spec- 
ifications, but rely on auxiliary assertions like loop invariants provided by the 
user, and on SMT solvers or automated theorem provers for quantifier reasoning. 
Although several deductive verification tools can parse extended quantifiers, few 
offer support for reasoning about them. Our work is closest to the method for 
handling comprehension operators in Spec# [35], which relies on code annota- 
tions provided by the user, but provides heuristics to automatically verify such 
annotations. The code instrumentation presented in this paper has similarity 
with the proof rules in Spec#; the main differences are that our method is based 
on an upfront program transformation, and that we aim at automatically find- 
ing required program invariants, as opposed to only verifying their correctness. 
The KeY tool provides proof rules similar to the ones in Spec# for some of the 
JML extended quantifiers [2]; those proof rules can be applied manually to verify 
human-written invariants. The Frama-C system [15] can parse ACSL extended 
quantifiers [7], but, to the best of our knowledge, none of the Frama-C plug- 
ins can automatically process such quantifiers. Other systems, e.g., Dafny [34], 
require users to manually define aggregation operators as recursive functions. 
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In the theory of algebraic data-types, several transformation-based approaches 
have been proposed to verify properties that involve recursive functions or cata- 
morphisms [17,31]. Aggregation over arrays resembles the evaluation of recur- 
sive functions over data-types; a major difference is that data-types are more 
restricted with respect to accessing and updating data than arrays. 

Array folds logic (AFL) [16] is a decidable logic in which properties on arrays 
beyond standard quantification can be expressed: for instance, counting the num- 
ber of elements with some property. Similar properties can be expressed using 
automata on data words [41], or in variants of monadic second-order logic [38]. 
Such languages can be seen as alternative formalisms to aggregation or extended 
quantifiers; they do not cover, however, all kinds of aggregation we are interested 
in. Array sums cannot be expressed in AFL or data automata, for instance. 


7 Conclusion 


We have presented a framework for automatic and provably correct program 
instrumentation, allowing the automatic verification of programs containing cer- 
tain expressive language constructs, which are not directly supported by the 
existing automatic verification tools. Our experiments with a prototypical imple- 
mentation, in the tool MONOCERA, show that our method is able to automati- 
cally verify a significant number of benchmark programs involving quantification 
and aggregation over arrays that are beyond the scope of other tools. 

There are still various other benchmarks that MONOCERA (as well as other 
tools) cannot verify. We believe that many of those benchmarks are in reach of 
our method, because of the generality of our approach. Ghost code is known 
to be a powerful specification mechanism; similarly, in our setting, more pow- 
erful instrumentation operators can be easily formulated for specific kinds of 
programs. In future work, we therefore plan to develop a library of instrumenta- 
tion operators for different language constructs (including arithmetic operators), 
non-linear arithmetic, other types of structures with regular access patterns such 
as binary heaps, and general linked-data structures. 

We also plan to refine our method for showing incorrectness of programs 
more efficiently, as the approach is currently applicable mainly for verifying 
correctness (experiments in [4]). Another line of work is the establishment of 
stronger completeness results than the weak completeness result presented here, 
for specific programming language fragments. 
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Abstract. In this paper, we address the problem of the (reactive) realiz- 
ability of specifications of theories richer than Booleans, including arith- 
metic theories. Our approach transforms theory specifications into purely 
Boolean specifications by (1) substituting theory literals by Boolean vari- 
ables, and (2) computing an additional Boolean requirement that cap- 
tures the dependencies between the new variables imposed by the literals. 
The resulting specification can be passed to existing Boolean off-the-shelf 
realizability tools, and is realizable if and only if the original specification 
is realizable. The first contribution is a brute-force version of our method, 
which requires a number of SMT queries that is doubly exponential in 
the number of input literals. Then, we present a faster method that 
exploits a nested encoding of the search for the extra requirement and 
uses SAT solving for faster traversing the search space and uses SMT 
queries internally. Another contribution is a prototype in Z3-Python. 
Finally, we report an empirical evaluation using specifications inspired 
in real industrial cases. To the best of our knowledge, this is the first 
method that succeeds in non-Boolean LTL realizability. 


1 Introduction 


Reactive synthesis [30,31] is the problem of automatically producing a system 
that is guaranteed to model a given temporal specification, where the Boolean 
variables (i.e., atomic propositions) are split into variables controlled by the 
environment and variables controlled by the system. Realizability is the related 
decision problem of deciding whether such a system exists. These problems have 
been widely studied [17,21], specially in the domain of Linear Temporal Logic 
(LTL) [29]. Realizability corresponds to infinite games where players alterna- 
tively choose the valuations of the Boolean variables they control. The winning 
condition is extracted from the temporal specification and determines which 
player wins a given play. A system is realizable if and only if the system player 
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has a winning strategy, i.e., if there is a way to play such that the specification 
is satisfied in all plays played according to the strategy. 

However, in practice, many real and industrial specifications use complex 
data beyond Boolean atomic propositions, which precludes the direct use of 
realizability tools. These specifications cannot be written in (propositional) LTL, 
but instead use literals from a richer domain. We use LTL7 for the extension 
of LTL where Boolean atomic propositions can be literals from a (multi-sorted) 
first-order theory 7. The T variables (i.e., non-Boolean) in the specification are 
again split into those controlled by the system and those controlled by the envi- 
ronment. The resulting realizability problem also corresponds to infinite games, 
but, in this case, players chose valuations from the domains of 7, which may 
be infinite. Therefore, arenas may be infinite and positions may have infinitely 
many successors. In this paper, we present a method that transforms a specifica- 
tion that uses data from a theory T into an equi-realizable Boolean specification. 
The resulting specification can then be processed by an off-the-shelf realizability 
tool. 

The main element of our method is a novel Boolean abstraction method, 
which allows to transform LTL7 specifications into pure (Boolean) LTL specifi- 
cations. The method first substitutes all T literals by fresh Boolean variables con- 
trolled by the system, and then extends the specification with an additional sub- 
formula that constrains the combination values of these variables. This method 
is described in Sect. 3. The main idea is that, after the environment selects val- 
ues for its (data) variables, the system responds with values for the variables 
it controls, which induces a Boolean value for all the literals. The additional 
formula we compute captures the set of possible valuations of literals and the 
precise power of each player to produce each valuation. 


Example 1. Consider the following specification y = O(Ro A R1), where: 


Ro: (x <2) > Oy > 1) Ri: (x > 2) > (y < 2) 


where x is a numeric variable that belongs to the environment and y to the sys- 
tem. In the game corresponding to this specification, each player has an infinite 
number of choices at each time step. For example, in 7z (the theory of integers), 
the environment player chooses an integer for x and the system responds with 
an integer for y. This induces a valuation of all literals in the formula, which in 
turn induces (also considering the valuations of the literals at other time instants, 
according to the temporal operators) a valuation of the full specification. 

In this paper, we exploit that, from the point of view of the valuations of 
the literals, there are only finitely many cases and provide a systematic man- 
ner to compute these cases. This allows us to reduce a specification into a 
purely Boolean specification that is equi-realizable. This specification encodes 
the (finite) set of decisions of the environment, and the (finite) set of reactions 
of the system. 


Example 1 suggests a naive algorithm to capture the powers of the environ- 
ment and system to determine a combination of the valuations of the literals, by 
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enumerating all these combinations and checking the validity of each potential 
reaction. Checking that a given combination is a possible reaction requires an 
3*V* query (which can be delegated to an SMT solver for appropriate theories). 

In this paper, we describe and prove correct a Boolean abstraction method 
based on this idea. Then, we propose a more efficient search method for the set 
of possible reactions using SAT solving to speed up the exploration of the set of 
reactions. The main idea of this faster method is to learn from an invalid reaction 
which other reactions are guaranteed to be invalid, and from a valid reaction 
which other reactions are not worth being explored. We encode these learnt 
sets as a incremental SAT formula that allows to prune the search space. The 
resulting method is much more efficient than brute-force enumeration because, 
in each iteration, the learning can prune an exponential number of cases. An 
important technical detail is that computing the set of cases to be pruned from 
the outcome of a given query can be described efficiently using a SAT solver. 

In summary, our contributions are: (1) a proof that realizability is decidable 
for all LTL7z specifications for those theories 7 with a decidable 4*V* fragment; 
(2) a simple implementation of the resulting Boolean abstraction method; (3) 
a much faster method based on a nested-SAT implementation of the Boolean 
abstraction method that efficiently explores the search space of potential reac- 
tions; and (4) an empirical evaluation of these algorithms, where our early find- 
ings suggest that Boolean abstractions can be used with specifications contain- 
ing different arithmetic theories, and also with industrial specifications. We used 
Z3 [10] both as an SMT solver and a SAT solver, and Strix [27] as the realizabil- 
ity checker. To the best of our knowledge, this is the first method that succeeds 
(and efficiently) in non-Boolean LTL realizability. 


2 Preliminaries 
We study realizability of LTL [26,29] specifications. The syntax of LTL is: 


g:=Tlalyvel|-~| Op| puy 


where a ranges from an atomic set of proposition AP, V, A and ~ are the usual 

Boolean disjunction, conjunction and negation, and O and U are the next and 
until temporal operators. The semantics of LTL associate traces o € ©” with 
formulae as follows: 


ocłHT always 

oHa if aé€o(0) 

CF yiVgo iff cE gı o0 E p2 

o E79 iff ofy 

o = Oy if oigo 

csK gig. iff forsomei>0 ot Kw, and for all 0 < j < iof H yı 


We use common derived operators like V, R, © and D. 
Reactive synthesis [4,5,14,28,33] is the problem of producing a system from 
an LTL specification, where the atomic propositions are split into propositions 
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that are controlled by the environment and those that are controlled by the 
system. Synthesis corresponds to a turn-based game where, in each turn, the 
environment produces values of its variables (inputs) and the system responds 
with values of its variables (outputs). A play is an infinite sequence of turns. The 
system player wins a play according to an LTL formula y if the trace of the play 
satisfies y. A (memory-less) strategy of a player is a map from positions into a 
move for the player. A play is played according to a strategy if all the moves 
of the corresponding player are played according to the strategy. A strategy is 
winning for a player if all the possible plays played according to the strategy are 
winning. 

Depending on the fragment of LTL used, the synthesis problem has different 
complexities. The method that we present in this paper generates a formula 
in the same temporal fragment as the original formula (e.g., starting from a 
safety formula another safety formula is generated). The generated formula is 
discharged into a solver capable to solve formulas in the right fragment. For 
simplicity in the presentation, we illustrate our method with safety formulae. 

We use LTL7z as the extension of LTL where propositions are replaced by 
literals from a first-order theory T. In realizability for LTL7, the variables that 
occur in the literals of a specification ọ are split into those variables controlled 
by the environment (denoted by Ue) and those controlled by the system (Vs), 
where Je N Us = 0. We use (Ue, Us) to remark that Ue U Us are the variables 
occurring in y. The alphabet Xy is now a valuation of the variables in Ve UUs. 
A trace is an infinite sequence of valuations, which induces an infinite sequence 
of Boolean values of the literals occurring in y and, in turn, a valuation of the 
temporal formula. 

Realizability for LTLz corresponds to an infinite game with an infinite arena 
where positions may have infinitely many successors if the ranges of the variables 
controlled by the system and the environment are infinite. For instance, in Ex. 1 
with 7 = Tz, valuation ranges over infinite values, and literal (x > 2) can be 
satisfied with z = 2, x = 3, etc. 

Arithmetic theories are a particular class of first-order theories. Even though 
our Boolean abstraction technique is applicable to any theory with a decidable 
4*V* fragment, we illustrate our technique with arithmetic specifications. Con- 
cretely, we will consider Tz (i.e., linear integer arithmetic) and Tp (i.e., non-linear 
real arithmetic). Both theories have a decidable 4*V* fragment. Note that the 
choice of the theory influences the realizability of a given formula. 


Example 2. Consider Ex. 1. The formula y := Ro A Rı is not realizable for 77, 
since, if at a given instant t, the environment plays x = 0 (and hence x < 2 is 
true), then y must be greater than 1 at time t+1. Then, if at t+1 the environment 
plays x = 2 then (x > 2) is true but there is no y such that both (y > 1) and 
(y < 2). However, for Tp, ọ is realizable (consider the system strategy to always 
play y = 1.5). 

The following slight modifications of Ex. 1 alters its realizability (R{, substi- 
tutes Rı by having the T-predicate y < x instead of y < x): 


Ro : (x < 2) > Oly > 1) Ry: (x > 2) > (y <a) 
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Now, y’ = O(Ro A R4) is realizable for both 7z and Tg, as the strategy of the 
system to always pick y = 2 is winning in both theories. 


3 Boolean Abstraction 


We solve the realizability problem modulo theories by transforming the spec- 
ification into an equi-realizable Boolean specification. Given a specification yp 
with literals l;, we get a new specification y[l; — si] A Dye, where s; are 
fresh Boolean variables and yp" € LTLg is a Boolean formula (without tem- 
poral operators). The additional sub-formula y°""* uses the freshly introduced 
variables s; controlled by the system, as well as additional Boolean variables 
controlled by the environment €, and captures the precise combined power of 
the players to decide the valuations of the literals in the original formula. We 
call our approach Booleanization or Boolean abstraction. The approach is sum- 
marized in Fig. 1: given an LTL specification yr, it is translated into a Boolean 
yp which can be analyzed with off-the-shelf realizability checkers. Note that G? 
and G7 are the games constructed from specifications yg and yr, respectively. 
Also, note that [20] shows that we can construct a game G from a specification 
y and that y is realizable if and only if G is winning for the system. 


Booleanization Realizability 
> PB > ` 
k A Tool 
i Thm. 1 bg 


Fig. 1. The tool chain with the correctness argument. 


The Booleanization procedure constructs an extra requirement ye"? 


and 
conjoins Oy? with the formula yfl; — s;]. In a nutshell, after the environment 
chooses a valuation of the variables it controls (including @), the system responds 
with valuations of its variables (including s;), which induces a Boolean value for 
all literals. Therefore, for each possible choice of the environment, the system has 
the power to choose a Boolean response among a specific collection of responses 
(a subset of all the possible combinations of Boolean valuations of the literals). 
Since the set of all possible responses is finite, so are the different cases. The extra 
requirement captures precisely the finite collection of choices of the environment 
and the resulting finite collection of responses of the system for each case. 


3.1 Notation 


In order to explain the construction of the extra requirement, we introduce some 
preliminary definitions. We will use Ex. 1 as the running example. 
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A literal is an atom or its negation, regardless of whether the atom is a 
Boolean variable or a predicate of a theory. Let Lit(y) be the collection of 
literals that appear in y (or Lit, if the formula is clear from the context). For 
simplicity, we assume that all literals belong the same theory, but each theory 
can be Booleanized in turn, as each literal belongs to exactly one theory and we 
assume in this paper that literals from different theories do not share variables. 
We will use as the environment controlled variables occurring in Lit(y) and y 
for the variables controlled by the system. 

In Ex. 1, we first translate the literals in y. Since (a < 2) is equivalent to 
(a > 2), we use a single Boolean variable for both. The substitutions is: 


(x < 2) — so (y > 1) sı (y < x) = s2 
(x > 2) = 189 (y < 1) = =51 (y > z) — 82 


After the substitution we obtain y” = O(RB ^A R?) where 


RE: so > Osi RP: 789 > s2 


Note that y” may not be equi-realizable to y, as we may be giving too much 
power to the system if so, sı and s2 are chosen independently without restriction. 
Note that y” is realizable, for example by always choosing sı and s2 to be true, 
but ¢ is not realizable in LTLz,. This justifies the need of an extra sub-formula. 


Definition 1 (Choice). A choice c C Lit(y) is a subset of the literals of p. 


The intended meaning of a choice is to capture what literals are true in the 
choice, while the rest (i.e., Lit \ c) are false. Once the environment picks values 
for T, the system can realize some choice c by selecting y and making the literals 
in c true (and the rest false). However, for some values of 7, some choices may 
not be possible for the system for any y. Given a choice c, we use f(c(%,Y)) to 


denote the formula: 

Nia Na 

lee lc 
which is a formula with variables & and y that captures logically the set of values 
of 7 and y that realize precisely choice c. We use C for the set of choices. Note 
that there are |C| = 2!4“! different choices. We call the elements of C choices 
because they may be at the disposal of the system to choose by picking the right 
values of its variables. 

A given choice c can act as potential (meaning that the response is possible) 
or as antipotential (meaning that the response is not possible). A potential is 
a formula (that depends only on %) that captures those values of z for which 
the system can respond and make precisely the literals in c true (and the rest of 
the literals false). The negation of the potential (i.e., an antipotential) captures 
precisely those values of © for which there are no values of y that lead to c. 


Definition 2 (Potential and Antipotential). Given a choice c, a potential 
is the following formula c? and an antipotential is the following formula c*: 


(2) = 3. f(e(Z, 9) e'(Z) = Yy. =f (eZ, 9)) 
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Example 3. We illustrate two choices for Ex. 1. Consider choices co = {(x < 
2),(y > 1),(y < x)} and cı = {(x < 2),(y > 1)}. Choice co corresponds to 
f(co) = (x < 2VA(y>1)A(y < a), that is, literals (x < 2), (y > 1) and (y < zx) 
are true. Choice cı corresponds to f(c1) = (x < 2)A (y > 1)A(y => a), that is, 
literals (x < 2) and (y > 1) being true and (y < x) being false (i.e., (y > x) 
being true). It is easy to see the meaning of c2, c3 etc. Then, the potential and 
antipotential formulae of e.g., choices co and cı from Ex. 1 are as follows: 


c =Vy.n((@ < 2) A (y > 1) A (y < x) 
z< 2)A^A(y>1)A^A(y:2 rz) cf = Vy.n((@ < 2) A (y > 1) A (y 2 x) 


aS 
II 
wW Ww 
«e 
TE 
A 
D 
> 
© 
V 
= 
> 
© 
A 
= 


Note that potentials and antipotentials have z as the only free variables. 


Depending on the theory, the validity of potentials and antipotentials may be 
different. For instance, consider ch and theories Tz and Ta: 


— In Tz: Jy.(x < 2) A (y > 1) A (y < zx) is equivalent to false. 
— In R: Jy.(x < 2) A (y > 1) A (y < x) is equivalent to (x < 2). 


These equivalences can be obtained using classic quantifier elimination proce- 
dures, e.g., with Cooper’s algorithm [9] for 7z and Tarski’s method [32] for Tp. 

A reaction is a description of the specific choices that the system has the 
power to choose. 


Definition 3 (Reaction). Let P and A be a partition of C that is: P C C, 
ACC, PNA=0 and PUA=C. The reaction react;p_a) is as follows: 


react p a) (T) a \ cP A \ Cc 
cer cE A 


The reaction reactp, a) is equivalent to: 


reactp,a)(@) = A GTE) A (Va--F(e(@,9)))- 


cEP cEA 


There are 22"! different reactions. 

A reaction r is called valid whenever there is a move of the environment for 
which r captures precisely the power of the system, that is exactly which choices 
the system can choose. Formally, a reaction is valid whenever 4%.r(Z) is a valid 
formula. We use R for the set of reactions and VR for the set of valid reactions. 
It is easy to see that, for all possible valuations of x the environment can pick, 
the system has a specific power to respond (among the finitely many cases). 
Therefore, the following formula is valid: 


PvR = VT. VV r(Z). 
reVR 
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Example 4. In Ex. 1, for theory 7z, we find there are two valid reactions (using 
choices from Ex. 3): 


rı Irce Ac, Act Acca AcE Aue Ack Nc 


RONG ANANA AAG AG, Ath, 


TQ: 


where reaction rı models the possible responses of the system after the envi- 
ronment picks a value for x with (x < 2), whereas r2 models the responses to 
(a > 2). On the other hand, for Tg, there are three valid reactions: 


MrsngAqGAGAGAGAGAANG 
0 1 2 3 4 5 6 7 
fy: Irch AG AG AG AG ACE Ack ACF 
~GrdneNGAGAGAGACAGAG 


Note that there is one valid reaction more, since in Jp there is one more 
case: x € (1, 2]. Also, note that c4 cannot be a potential in 7z (not even with a 
collaboration between environment and system), whereas it can in Tp. 


3.2 The Boolean Abstraction Algorithm 


Boolean abstraction is a method to compute yg from yr. In this section we 
describe and prove correct a basic brute-force version of this method, and later 
in Sect. 4, we present faster algorithms. All Boolean abstraction algorithms that 
we present on this paper first compute the extra requirement, by visiting the set 
of reactions and computing a subset of the valid reactions that is sufficient to 
preserve realizability. The three main building blocks of our algorithms are (1) 
the stop criteria of the search for reactions; (2) how to obtain the next reaction to 
consider; and (3) how to modify the current set of valid reactions (by adding new 
valid reactions to it) and the set of remaining reactions (by pruning the search 
space). Finally, after the loop, the algorithm produces as y°"“" a conjunction of 
cases, one per valid reaction (P, A) in VR. 
or We introduce a fresh variable ep, a), 


eprint Bier controlled by the environment for each 


1 Input: yr valid reaction (P, A), to capture that the 
2 y' — orll — si] VR — {} environment plays values for 7 that corre- 
3 C — choices(literals(pr)) spond to the case where the system is left 
4 R— 2 with the power to choose captured pre- 
5 for (P, A) E R do cisely by (P, A). Therefore, there is one 
6 if Jz.react(p a) (T) then additional environment Boolean variable 
7 | VR VRU {(P, A)} per valid reaction (in practice we can enu- 


merate the number of valid reactions and 
introduce only a logarithmic number of 
environment variables). Finally, the extra 
requirement uses P for each valid reac- 
tion (P, A) to encode the potential moves of the systems as a disjunction of the 
literals described by each choice in P. Each of these disjunction contains pre- 
cisely the combinations of literals that are possible for the concrete case that 
(P, A) captures. 


oo 


peura ee getExtra(VR) 
return y’ A O(A — yp") 


© 
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A brute-force algorithm that implements Boolean abstraction method by 
exhaustively searching all reactions is shown in Algorithm 1. The building blocks 
of this algorithm are: 


(1) It stops when the remaining set of reactions is empty. 

(2) It traverses the set R according to some predetermined order. 

(3) To modify the set of valid reactions, if (P, A) is valid it adds (P, A) to the 
set VR (line 7). To modify the set of remaining reactions, it removes (P, A) 
from the search. 


Finally, the extra sub-formula y°"""* 


follows: 


is generated by getEztra (line 8) defined as 


getExtra( VR) = \ (e(P 4) > VA Si N \ 78;)) 


(P,A)E VR cEP 1;€c lige 


Note that there is an 3*V* validity query in the body of the loop (line 6) to 
check whether the candidate reaction is valid. This is why decidability of the 
d*V* fragment is crucial because it captures the finite partitioning of the envi- 
ronment moves (which is existentially quantified) for which the system can react 
in certain ways (i.e., potentials, which are existentially quantified) by picking 
appropriate valuations but not in others (i.e., antipotentials, which are uni- 
versally quantified). In essence, the brute-force algorithm iterates over all the 
reactions, one at a time, checking whether each reaction is valid or not. In case 
the reaction (characterized by the set of potential choices!) is valid, it is added 
to VR. 


Example 5. Consider again the specification in Ex. 1, with 7z as theory. Note 
that the valid reactions are rı and r2, as shown in Ex. 4, where the potentials of 
rı are {c1, C2, C3 } and the potentials of rz are {cs, cg}. Now, the creation of p@"" 
requires two fresh variables dọ and dı for the environment (they correspond to 
environment decisions (x < 2) and (x > 2), respectively), resulting into: 

do _ ((so TAN S1 TAN 782) V (So TAN “S1 TAN s2) V (so TAN “S1 TAN ~s2)) 
extra . A 


dı — ((=89 TAN S1 TAN 782) V (=89 A =sS1 A 82)) 


For example cp = {sg} is a choice that appears as potential in valid reaction 
rı, so it appears as a disjunct of do as (so A 781 A 782). The resulting Booleanized 
specification yg is as follows: 


yr, = (¢" AD (AB > 9H") 


1 The potentials in a choice characterize the precise power of the system player, 
because the potentials correspond with what the system can respond. 
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Note that the Boolean encoding is extended with an assumption formula 
Ag = (dp © 7d,) A (do V dy) that restricts environment moves to guarantee that 
exactly one environment decision variable is picked. Also, note that a Boolean 
abstraction algorithm will output three (instead of two) decisions for the envi- 
ronment, but we ackowledge that one of them will never be played by it, since 
it gives strictly more power to the system. The complexity of this brute-force 
Booleanization algorithm is doubly exponential in the number of literals. 


3.3 From Local Simulation to Equi-Realizability 


The intuition about the correctness of the algorithm is that the extra requirement 
encodes precisely all reactions (i.e., collections of choices), for which there is a 
move of the environment that leaves the system with precisely that power to 
respond. As an observation, in the extra requirement, the set of potentials in 
valid reactions cannot be empty. This is stated in Lemma 1. 


Lemma 1. Let C €C be such that reactc € VR. Then C #90. 


Proof. Bear in mind reactc € VR is valid. Let v be such that reactco[% < v] is 
valid. Let w be an arbitrary valuation of y and let c be a choice and / a literal. 


Therefore: 
VAN LA VAN al 


l[x-v,y—U] is true l[x-v,y—U] is false 


It follows that [[% — v]3Ņ.c, so c € C. 


Lemma 1 is crucial, because it ensures that once a Boolean abstraction algorithm 
is executed, for each fresh € variable in the extra requirement, at least one 
reaction with one or more potentials can be responded by the system. 
Therefore, in each position in the realizability game, the system can respond 
to moves of the system leaving to precisely corresponding positions in the 
Boolean game. In turn, this leads to equi-realizability because each move can 
be simulated in the corresponding game. Concretely, it is easy to see that we 
can define a simulation between the positions of the games for yr and yg such 
that (1) each literal l; and the corresponding variable s; have the same truth 
value in related positions, (2) the extra requirement is always satisfied, and (3) 
moves of the system in each game from related positions in each game can be 
mimicked in the other game. This is captured by the following theorem: 


Theorem 1. System wins G7 if and only if System wins the game GÈ. There- 
fore, pr is realizable if and only if yg is realizable. 


Proof. (Sketch). Since realizability games are memory-less determined, it is suf- 
ficient to consider only local strategies. Given a strategy pg that is winning in 
G® we define a strategy pr in G7 as follows. Assuming related positions, pr 
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moves in G7 to the successor that is related to the position where pg moves in 
G®. By (3) above, it follows that for every play played in G® according to pp 
there is a play in G7 played according to pz that results in the same trace, and 
vice-versa: for every play played in G7 according to pz there is a play in G® 
played according to pg that results in the same trace. Since pg is winning, so is 
pr. The other direction follows similarly, because again pg can be constructed 
from pz not only guaranteeing the same valuation of literals and corresponding 
variables, but also that the extra requirement holds in the resulting position. 


The following corollary of Thm. 1 follows immediately. 


Theorem 2. Let T be a theory with a decidable *V*-fragment. Then, LTLy 
realizability is decidable. 


4 Efficient Algorithms for Boolean Abstraction 


4.1 Quasi-reactions 


The basic algorithm presented in Sect.3 exhaustively traverses the set of reac- 
tions, one at a time, checking whether each reaction is valid. Therefore, the 
body of the loop is visited 2'¢! times. In practice, the running time of this basic 
algorithm quickly becomes unfeasible. 

We now improve Alg. 1 by exploiting the observation that every SMT query 
for the validity of a reaction reveals information about the validity of other 
reactions. We will exploit this idea by learning uninteresting subsequent sets of 
reactions and pruning the search space. The faster algorithms that we present 
below encode the remaining search space using a SAT formula, whose models 
are further reactions to explore. 

To implement the learning-and-pruning idea we first introduce the notion of 
quasi-reaction. 


Definition 4 (Quasi-reaction). A quasi-reaction is a pair (P, A) where P C 
C, ACC and PNA=9. 


Quasi-reactions remove from reactions the constraint that PUA = C. A quasi- 
reaction represents the set of reactions that would be obtained from choosing 
the remaining choices that are neither in P nor in A as either potential or 
antipotential. The set of quasi-reactions is: 


Q={(P,A)|P,ACC and PHA=O} 


Note that R = {(P, A) € QO|PUA=C}. 
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Example 6. Consider a case with four choices cg, c1, C2 and c3. The quasi-reaction 
({co, c2}, {c1}) corresponds to the following formula: 


gz. (Ay. f(co(@,¥)) AVY. af (ci(@,¥)) A Ay. f (co(@,9))) 


Note that nothing is stated in this quasi-reaction about c3 (it neither acts as a 
potential nor as an antipotential). 


Consider the following order between quasi-reactions: (P, A) < (P’, A’) holds 
if and only if P C P’ and A C A’. It is easy to see that < is a partial order, 
that (0,0) is the lowest element and that for every two elements (P, A) and 
(P’, A’) there is a greatest lower bound (namely (P N P’, A N A’)). Therefore 
(P, A) 0 (P’, A’) = (P ñ P',AN A’) is a meet operation (it is associative, 
commutative and idempotent). Note that q < q’ if and only if qN g = q. 
Formally: 


Proposition 1. (Q, M) is a lower semi-lattice. 


The quasi-reaction semi-lattice represents how informative a quasi-reaction 
is. Given a quasi-reaction (P, A), removing an element from either P or A results 
in a strictly less informative quasi-reaction. The lowest element (Ø, Ø) contains 
the least information. 

Given a quasi-reaction q, the set Q, = {q € Q|q’ < q} of the quasi-reactions 

def 


below q form a full lattice with join (P,Q) U (P’,Q’) = (PU P’,QUQ’). This 
is well defined because P’ and Q, and P and Q’ are guaranteed to be disjoint. 
Proposition 2. For every q, (Qq,1,U) is a lattice. 


As for reactions, quasi-reactions correspond to a formula in the theory as 


follows: 
qreactp a (È) = A (Ag-c(z,9)) A A (Y7, 9) 
cEP cE A 


Again, given a quasi-reaction q, if 4r.greact,(Z) is valid we say that q is valid, 
otherwise we say that q is invalid. The following holds directly from the def- 
inition (and the fact that adding conjuncts makes a first-order formula “less 
satisfiable” ). 


Proposition 3. Let q,q' be two quasi-reactions with q < q'. If q is invalid then 
q is invalid. If q' is valid then q is valid. 


These results enable the following optimizations. 


4.2 Quasi-reaction-based Optimizations 


A Logic-Based Optimization. Consider that, during the search for valid 
reactions in the main loop, a reaction (P,A) is found to be invalid, that is 
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react(p,a) is unsatisfiable. If the algorithms explores the quasi-reactions below 
(P, A), finding (P’, A’) < (P, A) such that greact;p, ar, then by Prop. 3, every 
reaction (P”, A”) above (P’, A’) is guaranteed to be invalid. This allows to prune 
the search in the main loop by computing a more informative quasi-reaction q 
after an invalid reaction r is found, and skipping all reactions above q (and not 
only r). For example, if the reaction corresponding to ({co, c2, c3}, {c1 }) is found 
to be invalid, and by exploring quasi-reactions below it, we find that ({co}, {c1}) 
is also invalid, then we can skip all reactions above ({co}, {c:}). This includes 
for example ({co, co}, {c1,c3}) and ({co, c3}, {c1, c2}). In general, the lower the 
invalid quasi-reaction in <, the more reactions will be pruned. This optimization 
resembles a standard choosing of max/min elements in an anti-chain. 


A Game-Based Optimization. Consider now two reactions r = (P, A) and 
r’ = (P’, A’) such that P C P’ and assume that both are valid reactions. Since 
r’ allows more choices to the system (because the potentials P determine these 
choices), the environment player will always prefer to play r than r’. Formally, if 
there is a winning strategy for the environment that chooses values for & (corre- 
sponding to a model of react,.), then choosing values for Z’ instead (corresponding 
to a model of react) will also be winning. 

Therefore, if a reaction r is found to be valid, we can prune the search for reac- 
tions r’ that contain strictly more potentials, because even if r’ is also valid, it will 
be less interesting for the environment player. For instance, if ({co, cs}, {c1, c2}) 
is valid, then ({co, c1, c3}, {c2}) and ({co, c1, cs, co}, {}) become uninteresting to 
be explored and can be pruned from the search. 


4.3 A Single Model-Loop Algorithm (Algorithm 2) 


We present now a faster algorithm that replaces the main loop of Algorithm 1 
that performs exhaustive exploration with a SAT-based search procedure that 
prunes uninteresting reactions. In order to do so, we use a SAT formula w with 
one variable z; per choice c;, in a DPLL(T) fashion. An assignment v : Vars(q) > 
B to these variables represents a reaction (P, A) where 


P = {c;|v(z;) = true} A = {e;|v(z;) = false} 


Similarly, a partial assignment v : Vars(w) — B represents a quasi-reaction. 
The intended meaning of w is that its models encode the set of interest- 
ing reactions that remain to be explored. This formula is initialized with 
wy = true (note that =(\,, 72:) is also a correct starting point because the 
reaction where all choices are antipotentials is invalid). Then, a SAT query 
is used to find a satisfying assignment for w, which corresponds to a (quasi- 
)reaction r whose validity is interesting to be explored. Algorithm 2 shows 
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- the Model-loop algorithm. The three 
_Algorithm 2: Model-loop ___ main building blocks of the model-loop 
10 Input: yr algorithm are: 

z N aie 
eee E prli es] preta (1) Algorithm 2 stops when 7 is invalid 
12 C + choices(literals(yr)) (line 14) 
3 RL: pT , 


(2) To explore a new reaction, Algo- 
14 while SAT(y) do 


rithm 2 obtains a satisfying assign- 


1o m = model(4) ment for ~ (line 15). 
16 if IT. (toTheory(m,C)) (3) Algorithm 2 checks the validity of 
then the reaction (line 16) and enriches 
17 P — posVars(m) W% o prune according to what can be 
18 Y —PA-7(Apep P) learned, as follows: 
19 VR — VR U (e, P) — If the reaction is invalid (as a 
a result of the SMT query in line 
20 else : aq 
16), then it checks the validity of 
21 N — negVars(m) . : wa ae 
K Pie A quasi-reaction q = (@,A) in line 
a eee 23. If q is invalid, add the negation 
23 if Iz. toTheory(fh,C) , . : 
‘Han of q as a new conjunction of w (line 
26). If q is valid, add the negation 
24 p — YAm . : : 
— of the reaction (line 24). This pre- 
25 else vents all SAT models that agree 
26 p= yprnfh with one of these q, which corre- 


= spond to reactions q < r’, includ- 
27 pe" — getExtra(VR) ing r. 


28 return y’ ^A O(A —> pe") — If the reaction is valid, then it is 
ns added to the set of valid reactions 
VR and the corresponding quasi-reaction that results from removing the 
antipotentials is added (negated) to w (line 18), preventing the explo- 
ration of uninteresting cases, according to the game-based optimization. 


As for the notation in Algorithm 2 (also in Algorithm 3 and Algorithm 4), 
model(y) in line 15 is a function that returns a satisfying assignment of the SAT 
formula ~, posVars(m) returns the positive variables of m (e.g., ci,c; etc.) and 
neg Vars(m) returns the negative variables. Finally, toTheory(m,C) = Am, c? ^ 
Am, c; (in lines 16 and 23) translates a Boolean formula into its corresponding 
formula in the given 7 theory. Note that unsatisfiable m can be minimized 
finding cores. 

If r is invalid and (Ø, A) is found also to be invalid, then exponentially many 
cases can be pruned. Similarly, if r is valid, also exponentially many cases can 
be pruned. The following result shows the correctness of Algorithm 2: 


Theorem 3. Algorithm 2 terminates and outputs a correct Boolean abstraction. 


Proof. (Sketch). Algorithm 2 terminates because, at each step in the loop, w 
removes at least one satisfying assignment and the total number is bounded by 
2ICl, Also, the correctness of the generated formula is guaranteed because, for 
every valid reaction in Algorithm 1, either there is a valid reaction found in 
Algorithm 2 or a more promising reaction found in Algorithm 2. 
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4.4 A Nested-SAT Algorithm (Algorithm 3) 


We now present an improvement of Algorithm 2 that performs a more detailed 
search for a promising collection of invalid quasi-reactions under an invalid reac- 
tion r. 


Algorithm 3: Nested-SAT 


29 
30 
31 
32 
33 
34 
35 


36 
37 


38 


39 
40 
41 
42 
43 


44 
45 


Note that it is not necessary to find 
the precise collection of all the smallest 


Input: yr quasi-reactions that are under an invalid 
gy — prli — si] ; VR — {} reaction r, as long as at least one quasi- 
C — choices(literals(pr )) reaction under r is calculated (perhaps, 
RL; pT r itself). Finding lower quasi-reactions 
while SAT(w) do allow to prune more, but its calculation is 
m = model (y) more costly, because more SMT queries 
if 3T. (toTheory(m,C)) need to be performed. The Nested-SAT 
then algorithm (Algorithm 3) explores (using 
P — posVars(m) an inner SAT encoding) this trade-off 
Y = p A (Aper P) between computing more exhaustively 
VR — VRU (e, P) better invalid quasi-reactions and the 
Aine cost of the search. The three main build- 
N — negVars(m) ing blocks of the nested-SAT algorithm 

p — pam (see Algorithm 3) are: 
I — inner-loop(m, C) (1) It stops when 7) is invalid (as in Algo- 

| b— bAW(Aier?) rithm 2), in line 33. 

entra — getExtra(VR) (2) To get the reaction, obtain a satisfy- 
return Y'A O(A > ye") ing assignment m for w (as in Algo- 


rithm 2), in line 34. 


(3) Check the validity of the corresponding reaction and prune 7 according to 


what can be learned as follows. If the reaction is valid, then we proceed as 
in Algorithm 2. If r = (P, A) is invalid (as a result of the SMT query), then 
an inner SAT formula encodes whether a choice is masked (eliminated from 
P or A). Models of the inner SAT formula, therefore, correspond to quasi- 
reactions below r. If a quasi-reaction q found in the inner loop is invalid, 
the inner formula is additionally constrained and the set of invalid quasi- 
reactions is expanded. If a quasi-reaction q found is valid, then the inner 
SAT formula is pruned eliminating all quasi-reactions that are guaranteed 
to be valid. At the end of the inner loop, a (non-empty) collection of invalid 
quasi-reactions are added to w. 


The inner loop, shown in Algorithm 4 (where VQ stands for valid quasi- 


reactions), explores a full lattice. 
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Also, note that 7(A,, zi) is, again, 


Algorithm 4: Inner loop a correct starting point. Consider, for 


46 Input: m,C example, that the outer loop finds 
47 VQ- {}; 8T ({c1, c3}, {c0,c2}) to be invalid and that 
4s while SAT(6) do the inner loop produces assignment wo ^ 
49 u = model (p) w, A w2 A ~w. This corresponds to 
50 if c3 being masked producing quasi-reaction 
Az. (toTheory_inn(u,m,C))  ({c1}, {co, c2}). The pruning system is the 
then following: 
51 P — posVars(u) 
52 B = BA7(Aper P) — If quasi-reaction q is valid then the 
55 else inner SAT formula is pruned eliminat- 
a N — negVars(u) ing all inner models that agree with the 
E B—BAAWNnen®) model in the masked choices. In our 
56 VQ—VQU4 example, we would prune all models 
L = that satisfy =w if q is valid (because 
57 return VQ the resulting quasi-reactions will be 


inevitably valid). 


— If quasi-reaction q is invalid, then we prune in the inner search all quasi- 
reactions that mask less than q, because these will be inevitably invalid. In 
our example, we would prune all models satisfying =(wọ A wi A we). 


Note that toTheory-inn(u,m,C) = Aminu; % \A-miru; Ch #8 not the same func- 
tion as the toTheory() used in Algorithm 2 and Algorithm 3, since the inner 
loops needs both model m and mask u (which makes no sense to be negated) to 
translate a Boolean formula into a T-formula. Also, note that there is again a 
trade-off in the inner loop because an exhaustive search is not necessary. Thus, 
in practice, we also used some basic heuristics: (1) entering the inner loop only 
when (Ø, A) is invalid; (2) fixing a maximum number of inner model queries per 
outer model with the possibility to decrement this amount dynamically with a 
decay; and (3) reducing the number of times the inner loop is exercised (e.g., 
enter the inner loop only if the number of invalid outer models so far is even). 


Example 7. We explore the results of Algorithm 3. A possible execution for 2 
literals can be as follows: 


1. Reaction ({co,c3}, {c1, c2}) is obtained in line 34, which is declared invalid 
by the SMT solver in line 35. The inner loop called in line 42 produces 
({co}, {c1}), ({e3}, {c2}) and ({}, {c1,c2}) as three invalid quasi-reactions, 
and their negations are added to the SAT formula of the outer loop in line 43. 

2. A second reaction ({co, ci}, {c3, c4}) is obtained from the SAT solver in line 
34, and now the SMT solver query is valid in line 35. Then, -(co A c1) is 
added to the outer SAT formula in line 37. 

3. A third reaction ({c2, c3}, {co, c1}) is obtained in line 33 , which is again valid 
in line 35. Similarly, >(c2 A c3) is added the outer SAT formula in line 37. 
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4. A fourth reaction ({c1,c2},{co,c3}) is obtained in line 33, which is now 
invalid (line 35). The inner loop called in line 42 generates the following cores: 
({c1}, {co}) and ({c2}, {c3}). The addition of the negation of these cores leads 
to an unsatisfiable outer SAT formula, and the algorithm terminates. 


The execution in this example has performed 4 SAT+SMT queries in the 
outer loop, and 3+2 SAT+SMT queries in the inner loops. The brute-force 
Algorithm 1 would have performed 16 queries. Note that the difference between 
the exhaustive version and the optimisations soon increases exponentially when 
we consider specifications with more literals. 


5 Empirical Evaluation 


We perform an empirical evaluation on six specifications inspired by real indus- 
trial cases: Lift (Li.), Train (Tr.), Connect (Con.), Cooker (Coo.), Usb (Usb) 
and Stage (St.), and a synthetic example (Syn.) with versions from 2 to 7 literals. 
For the implementation, we used used Python 3.8.8 with Z3 4.11. 

It is easy to see that “clusters” of literals that do not share variables can 
be Booleanized independently, so we split into clusters each of the examples. 
We report our results in Fig.2. Each row contains the result for a cluster of 
an experiment (each one for the fastest heuristic). Each benchmark is split into 
clusters, where we show the number of variables (vr.) and literals (lt.) per cluster. 
We also show running times of each algorithm against each cluster; concretely, 
we test Algorithm 1 (BF), Algorithm 2 (SAT) and Algorithm 3 (Doub.). For 
Algorithm 2 and Algorithm3, we show the number of queries performed; in the 
case of Algorithm 3, we also show both outer and inner queries. Algorithm 1 
and Algorithm 2 require no heuristics. For Algorithm 3, we report, left to right: 
maximum number of inner loops (MzI.), the modulo division criteria (Md.)?, the 
number of queries after which we perform a decay of 1 in the maximum number 
of inner loops (Dc.), and if we apply the invalidity of (Ø, A) as a criteria to enter 
the inner loop (A.), where v means that we do and x means the contrary. Also, 
means timeout (or no data). 

The brute-force (BF) Algorithm 1 performs well with 3 or fewer literals, but 
the performance dramatically decreases with 4 literals. Algorithm 2 (single SAT) 
performs well up to 4 literals, and it can hardly handle cases with 6 or more liter- 
als. An exception is Lift (1,7) which is simpler since it has only one variable (and 
this implies that there is only one player). The performance improvement of SAT 
with respect to BF is due to the decreasing of queries. For example, Train (3,6) 
performs 13706 queries, whereas BF would need 22° = 1.844 - 10!8 queries. 

All examples are Booleanizable when using Algorithm 3 (two SAT loops), 
particularly when using a combination of concrete heuristics. For instance, in 


? This means that the inner loop is entered if and only if the number of invalid models 
so far is divisible by Md, and we found Md values of 2, 3 and 20 to be interesting. 
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Bn. | Cls. Time (s) Queries (out+inn) |/Heuristics (doub g 
(nm.)|(vr, 1t)|| BF | SAT |Doub.|| SAT Doub. MxI.|Md.|Dc.|A. || Val./Tme. 
(1, 7) L | 6740 |31.77|| 30375 72/1040 40} 2}0/V]} 1 
Li (2, 4) |]3911] 0.70 | 0.91 27 25/20 10 | 2 | 0 |x|} 16 ALAN 
1| (1, 3) ||3.64| 1.19 | 0.52 46 10/20 10 |2 |O[fxI| 4 Í 
(1, 2) || 0.23| 0.09 | 0.14 || 4 4/3 3ļ3ļoļxi 3 
(1, 3) || 3.18 | 0.04 | 0.96 16 26/20 10 |} 2} 0 }v)} 5 
(2, 1) || 0.05| 0.04 | 0.04 || 2 2/0 i a lr) 2 
(1, 3) |} 3.10] 1.64 | 0.21 74 2/10 10); 2ļ]0JļJv{vi| 1 
gy. | (1+ 1) [0.04] 0.06 0.11 || 3 3/2 1}1)/o)v| 1 ii 
| (3, 6) L | 1269 |112.5]| 13706 | 1170/4716 || 100 | 20 | 40| x || 15 i 
(4, 5) L | 5251 | 4144 ||44177| 52623/12332 || 100 | 20 | 40 | x || 24 
(3, 5) || L | 2044 |359.3|| 31363 | 9123/10158 || 100 | 20 | 40| xj} 9 
(4, 12)}) L L |6571 ale 2728/40920 || 100 | 20 | 40 | x || 104 
Con.| (2, 2) || 0.23 | 0.09 | 0.09 4 4/0 3 | 3 | O/}V] 4 | 437 
Coo.| (3, 5) L | 1356 | 2.81 || 27883 16/160 20 | 2/0 ]vj] 1 | 3.64 
Usb (2, 3) || 3.40} 0.21 | 0.17 8 8/0 3 | 3] 0]Vv] 8 3.93 
i (3, 5) L |231.9) 364.4 || 5638 5638/0 20 | 2 | 0 |v || 32 i 
St (8, 8) || L |18.19| 18.20 || 256 256/0 40 | 2 | O |v |} 256 6.06 
| (3, 6) L | 1311 |194.8]| 14994] 1697/6536 || 100 | 20 | 40| x || 45 i 

(2, 2) || 0.21 | 0.24 | 0.18 11 4/3 3 ]|3]ļ]o0ojvi 2 | 4.12 
(2, 3)|/ 3.42} 2.69 | 1.24 || 119 14/40 10 | 2]0 ;v]) 3 |411 
(2, 4) || 2842| 108.6 |16.51]| 3982 188/620 10 | 2 | OJv{vI| 3 | 4.28 
Syn. | (2, 5)|| L | 7151 |68.90]| 44259 | 380/2800 20 | 2 | O0 |v || 11 | 4.53 
(2, 6) 402.2 L 4792/9941 || 100 | 20 | 40 | x || 24 | 4.85 
(2, 7) 3596|| 1 |7344/139440]| 40 | 2 | 0 |v || 1 [BBO 
@, 0 3862 1 |24311/40615}]| 200 | 20 | 40 | x || 45 | 5.99 
Fig. 2. Empirical evaluation results of the different Boolean abstraction algorithms , 
where the best results are in bold and yg only refers to best times. 


small cases (2 to 5 literals) it seems that heuristic-setups like 3/3/3/0/V? are 
fast, whereas in bigger cases other setups like 40/2/0/V or 100/40/20/x are 
faster. We conjecture that a non-zero decay is required to handle large inputs, 
since inner loop exploration becomes less useful after some time. However, adding 
a decay is not always faster than fixing a number of inner loops (see Syn (2,7)), 
but it always yields better results in balancing the number of queries between 
the two nested SAT layers. Thus, since balancing the number of queries typically 
leads to faster execution times, we recommend to use decays. Note that we 
performed all the experiments reported in this section running all cases several 
times and computing averages, because Z3 exhibited a big volatility in the models 
it produces, which in turn influenced the running time of our algorithms. This 
significantly affects the precise reproducibility of the running times. For instance, 


3 This means: we only perform 3 inner loop queries per outer loop query (and there 
is no decay, i.e., decay = 0), we enter the inner loop once per 3 outer loops and we 
only enter the inner loop if (Ø, A) is invalid. 
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Lits| Alg. |Performed queries (out+inn)} Out of |Needed queries (~ %) 
2 [Alg 2 4 16 25 
3 Alg 2 8 256 3.125 
4 |Alg 3 83 + 380 65536 0.709 
5 |Alg 3 380 + 2800 4294967296 7.404. 10” 
6 [Alg 3 4792 + 9941 1.844 - 107 1-1077 
12 |Alg 3 2728 + 40920 love) 0 


Fig. 3. Best numbers of queries for Algorithm 2 and 3 relative to brute-force (Alg.1). 


Lits| Heuristic Tz Tr 
setup - |Time (s) Queries (ou/in)|Time (s) Queries (ou/in) 
3 |10/2/0/V| 0.63 8/30 0.90 14/40 
4 |10/2/0/7| 16.14 308/500 11.19 125/560 
5 |20/2/0/v| 62.44 408/3220 88.55 357/3460 
6 |40/2/0/V| 678.71 2094/32760 722.64 1862/35840 


Fig. 4. Comparison of Tz and Tp for Syn (2,8) to Syn (2,6). 


for Syn(2,5) the worst case execution was almost three times worst than the 
average execution reported in Fig. 2. Studying this phenomena more closely is 
work in progress. Note that there are cases in which the number of queries of 
SAT and Doub. are the same (e.g., Usb(3,5)), which happened when the A. 
heuristic had the effect of making the search not to enter the inner loop. 

In Fig. 2 we also analyzed the constructed yg, measuring the number of valid 
reactions from which it is made ( Val.) and the time (T’me.) that a realizability 
checker takes to verify whether yg (hence, yr) is realizable or not (expressed 
with dark and light gray colours, respectively). We used Strix [27] as the realiz- 
ability checker. As we can see, there is a correspondence between the expected 
realizability in yr and the realizability result that Strix returns in yg. Indeed, 
we can see all instances can be solved in less than 7 seconds, and the length of the 
Boolean formula (characterized by the number of valid reactions) hardly affects 
performance. This suggests that future work should be focused on reducing time 
necessary to produce Boolean abstraction to scale even further. 

Also, note that Fig. 2 shows remarkable results as for ratios of queries required 
with respect to the (doubly exponential) brute-force algorithm: e.g., 4792 + 9941 
(outer + inner loops) out of the 1.844-10!9 queries that the brute-force algorithm 
would need, which is less than its 1-10~1°% (see Fig. 3 for more details). We also 
compared the performance and number of queries for two different theories Tz 
and Te for Syn (2,3) to Syn (2,6). Note, again, that the realizability result may 
vary if a specification is interpreted in different theories, but this is not relevant 
for the experiment in Fig. 4, which suggests that time results are not dominated 
by the SMT solver; but, again, from the enclosing abstraction algorithms. 
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6 Related Work and Conclusions 


Related Work. Constraint LTL [11] extends LTL with the possibility of 
expressing constraints between variables at bounded distance (of time). The 
theories considered are a restricted form of 7z with only comparisons with addi- 
tional restrictions to overcome undecidability. In comparison, we do not allow 
predicates to compare variables at different timesteps, but we prove decidability 
for all theories with an 3*V* decidable fragment. LTL modulo theories is studied 
in [12,19] for finite traces and they allow temporal operators within predicates, 
leading the logic to undecidability. 

As for works closest to ours, [7] proposes numerical LTL synthesis using an 
interplay between an LTL synthesizer and a non-linear real arithmetic checker. 
However, [7] overapproximates the power of the system and hence it is not pre- 
cise for realizability. Linear arithmetic games are studied in [13] introducing 
algorithms for synthesizing winning strategies for non-reactive specifications. 
Also, [22] considers infinite theories (like us), but it does not guarantee success 
or termination, whereas our Boolean abstraction is complete. They only con- 
sider safety, while our approach considers all LTL. The follow-up [23] has still 
similar limitations: only liveness properties that can be reduced to safety are 
accepted, and guarantees termination only for the unrealizability case. Similarly, 
[18] is incomplete, and requires a powerful solver for many quantifier alterna- 
tions, which can be reduced to 1-alternation, but at the expense of the algo- 
rithm being no longer sound for the unrealizable case (e.g., depends on Z3 not 
answering “unknown”). As for [34], it (1) only considers safety/liveness GR(1) 
specifications, (2) is limited to the theory of fixed-size vectors and requires (3) 
quantifier elimination (4) and guidance. We only require 4*V*-satisfiability (for 
Boolean abstraction) and we consider multiple infinite theories. The usual main 
difference is that Boolean abstraction generates a (Boolean) LTL specification 
so that existing tools can be used with any of their internal techniques and algo- 
rithms (bounded synthesis, for example) and will automatically benefit from 
further optimizations. Moreover, it preserves fragments like safety and GR(1) so 
specialized solvers can be used. On the contrary, all approaches above adapt one 
specific technique and implement it in a monolithic way. 

Temporal Stream Logic (TSL) [16] extends LTL with complex data that can 
be related accross time, making use of a new update operator [y «= fa], to indi- 
cate that y receives the result of applying function f to variable x. TSL is later 
extended to theories in [15,25]. In all these works, realizability is undecidable. 
Also, in [8] reactive synthesis and syntax guided synthesis (SyGuS) [1] collab- 
orate in the synthesis process, and generate executable code that guarantees 
reactive and data-level properties. It also suffers from undecidability: both due 
to the undecidability of TSL [16] and of SyGus [6]. In comparison, we cannot 
relate values accross time but we provide a decidable realizability procedure. 

Comparing TSL with LTL7, TSL is undecidable already for safety, the the- 
ory of equality and Presburger arithmetic. More precisely, TSL is only known to 
be decidable for three fragments (see Thm. 7 in [15]). TSL is (1) semi-decidable 
for the reachability fragment of TSL (i.e., the fragment of TSL that only permits 
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the next operator and the eventually operator as temporal operators); (2) decid- 
able for formulae consisting of only logical operators, predicates, updates, next 
operators, and at most one top-level eventually operator; and (3) semi-decidable 
for formulae with one cell (i-e., controllable outputs). All the specifications con- 
sidered for empirical evaluation in Sect. 5 are not within the considered decidable 
or semi-decidable fragments. Also, TSL allows (finite) uninterpreted predicates, 
whereas we need to have predicates well defined within the semantics of theories 
of specifications for which we perform Boolean abstraction. 


Conclusion. The main contribution of this paper is to show that LTLy is 
decidable via a Boolean abstraction technique for all theories of data with a 
decidable 4*V* fragment. Our algorithms create, from a given LTL7 specifica- 
tion where atomic propositions are literals in such a theory, an equi-realizable 
specification with Boolean atomic propositions. We also have introduced effi- 
cient algorithms using SAT solvers for efficiently traversing the search space. A 
SAT formula encodes the space of reactions to be explore and our algorithms 
reduce this space by learning uninteresting areas from each reaction explores. 
The fastest algorithm uses a two layer SAT nested encoding, in a DPLL(T) 
fashion. This search yields dramatically more efficient running times and makes 
Boolean abstraction applicable to larger cases. We have performed an empirical 
evaluation of implementations of our algorithms. We found empirically that the 
best performances are obtained when there is a balance in the number of queries 
made by each layer of the SAT-search. To the best of our knowledge, this is the 
first method to propose a solution (and efficient) to realizability for general 3*\v* 
decidable theories, which include, for instance, the theories of integers and reals. 
Future work includes first how to improve scalability further. We plan to 
leverage quantifier elimination procedures [9] to produce candidates for the sets 
of valid reactions and then check (and correct) with faster algorithms. Also, opti- 
mizations based in quasi-reactions can be enhanced if state-of-the-art tools for 
satisfiability core search (e.g., [2,3,24]) are used. Another direction is to extend 
our realizability method into a synthesis procedure by synthesizing functions 
in T to produces witness values of variables controlled by the system given (1) 
environment and system moves in the Boolean game, and (2) environment values 
(consistent with the environment move). Finally, we plan to study how to extend 
LTL7 with controlled transfer of data accross time preserving decidability. 
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Abstract. We present a certified algebraic abstraction technique for verifying 
bit-accurate non-linear integer computations. In algebraic abstraction, programs 
are lifted to polynomial equations in the abstract domain. Algebraic techniques 
are employed to analyze abstract polynomial programs; SMT QF_BV solvers are 
adopted for bit-accurate analysis of soundness conditions. We explain how to 
verify our abstraction algorithm and certify verification results. Our hybrid tech- 
nique has verified non-linear computations in various security libraries such as 
BITCOIN and OPENSSL. We also report the certified verification of Number- 
Theoretic Transform programs from the post-quantum cryptosystem KYBER. 


1 Introduction 


Bit-accurate non-linear integer computations are infamously hard to verify. Conven- 
tional bit-accurate techniques such as bit blasting do not work well for non-linear 
computations. Approximation techniques through floating-point computation on the 
other hand are inaccurate. Non-linear integer computation nonetheless is essential to 
computer cryptography. Analyzing complex non-linear computation in cryptographic 
libraries is still one of the most challenging problems of the utmost importance today. 

In this paper, we address the verification problem through algebraic abstraction. 
In algebraic abstraction, abstract programs are represented by polynomial equations. 
Non-linear computation about abstract polynomial programs is analyzed algebraically 
and hence more efficiently through techniques from commutative algebra. Algebraic 
abstraction however is unsound due to overflow in bounded integer computation. We 
characterize soundness conditions with queries using the Quantifier-Free Bit-Vector 
(QF_BV) logic from Satisfiability Modulo Theories (SMT) [2]. SMT solvers are then 
used to check soundness conditions before applying algebraic abstraction. 

Our hybrid technique takes advantages of both algebraic and bit-accurate analyses. 
Non-linear algebraic properties are verified algebraically. Polynomials are computed 
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and analyzed by algorithms from commutative algebra. Coefficients, variables and 
arithmetic functions are atomic in such algorithms. Our algebraic analysis is hence very 
efficient for non-linear computation. Soundness conditions, on the other hand, require 
bit-accurate analysis. Our technique applies SMT QF_BV solvers to check soundness 
conditions. By combining algebraic with bit-accurate analyses, algebraic abstraction 
successfully verifies non-linear computation in real-world cryptographic programs. 
Cryptographic programs undoubtedly are widely deployed critical software. Errors 
in their verification need to be minimized. To this end, we use the proof assistant 
COQ [4] to verify the soundness theorem for algebraic abstraction. To ensure the cor- 
rectness of external algebraic and bit-accurate analysis tools, results from external tools 
are certified in our technique as well. With verified abstraction and certified external 
results, verification of bit-accurate non-linear integer computation through algebraic 
abstraction is certified. We explain how to certify our hybrid verification technique. 
We evaluate our certified technique with cryptographic programs from secu- 
rity libraries in BITCOIN [27], BORINGSSL [8,12], NSS [20], OPENSSL [23] and 
PQCRyYPTO-SIDH [18]. These programs compute field and group operations in ellip- 
tic curve cryptography. We also verify Number-Theoretic Transform (NTT) programs 
from the post-quantum cryptosystem KYBER [6]. In lattice-based post-quantum cryp- 
tography, computation in polynomial rings is needed. NTT is a discrete variant of the 
Fast Fourier Transform used for polynomial multiplication in KYBER. Our certified 
algebraic abstraction technique verifies cryptographic programs from elliptic curve and 
post-quantum cryptography successfully. Our contributions are summarized as follows. 


— We detail algebraic abstraction for checking non-linear modular equations with mul- 
tiple moduli; 

— We certify algebraic abstraction and its verification; 

— We report certified verification results for 39 real-world cryptographic programs in 
elliptic curve and post-quantum cryptography. 


Related Work. GFVERIF employs an ad hoc technique to verify non-linear computation 
in cryptographic programs with a computer algebra system [3]. CRYPTOLINE [9, 24, 29] 
is a tool designed for the specification and verification of cryptographic assembly codes. 
Its verification algorithm utilizes computer algebra systems in addition to SMT solvers. 
CRYPTOLINE is also leveraged to verify cryptographic C programs [9,17]. The opti- 
mized KYBER NTT program for avx2 is verified in [15], but the underlying verifi- 
cation algorithm is left unexplained. None of these works certified their verification 
results. Users had to trust these verification tools. BVCRYPTOLINE certifies algebraic 
abstraction but not soundness conditions [29]. It does not allow multiple moduli in 
modular equations either. Particularly, it cannot concisely specify NTT by the Chi- 
nese remainder theorem over polynomial rings. Compared with these works, our tech- 
nique admits modular equations with multiple moduli in assumptions and assertions, 
and is fully certified. To explicate our advantages, consider the specification of mul- 
tiplication in the field Zp434/(x + 1) where p434 is a prime number. An element 
in the field is of the form uo + uix where x? + 1 = 0. To specify rọ + riz is the 
product of uo + ux and vo + vız, one can write two modular equations with one 
modulo: ro = uovo — Uiv1 mod [p434] and rı = uovı + uvo mod [p434]. With 
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multiple moduli, we write ro + r12 = (uo + u1x)(vo + viz) mod [p434, x? + 1] 
succinctly. Our simple specifications are most useful for complicated fields such as 
Zp381/ (£? + 1, y? — x — 1,2? — y). Each element of the complex field is of the form 

ui jkxtyiz® with 0 < i,k < 2 and 0 < j < 3. Twelve modular equations are 
needed previously. One modular equation with multiple moduli suffices to specify its 
field multiplication in this work. Furthermore, our technique is verified in CoQ. The 
correctness of our abstraction algorithm and soundness theorem are formally proven in 
Coq. We also show how to certify results from external tools. In summary, the cor- 
rectness of algebraic abstraction algorithm is verified and answers from external tools 
are certified. Verification results are therefore fully certified. We believe this is the best 
guarantee a model checker can offer. Our verified model checker is sufficiently practical 
to verify industrial cryptographic programs too! 

Analysis of linear polynomial programs was discussed, for instance, in [21,22]. The 
reduction from the root entailment problem to the ideal membership problem is dis- 
cussed in [14]. In this work, the computer algebra system SINGULAR [13] is employed 
to compute standard bases of ideals and certificates. The certified SMT QF_BV solver 
COQQFBYV [26] is adopted to certify soundness conditions. 

The paper is organized as follows. Section 2 gives the needed backgrounds. It is fol- 
lowed by the syntax and semantics of the language TOYLANG. An implementation of 
the unsigned Montgomery reduction is given as a running example (Sect. 3). Section 4 
presents algebraic abstraction and its verification algorithms. We briefly describe certi- 
fied verification of algebraic abstraction in Sect. 5. Section 6 shows experimental results 
of real-world cryptographic programs. We conclude in Sect. 7. 


2 Preliminaries 


Let N and Z denote the set of non-negative and all integers respectively. Fix a set of 
variables X. We write Z[X] for the set of polynomials in variables X with coefficients in 
Z. A polynomial equation is of the form e = e’ with e, e’ € Z[X]; a polynomial modular 
equation is of the form e = e’ mod [fo, f1,---, fm] with e,e’, fo, fi,---, fm € ZIX]. 
A valuation p of X is a mapping from X to Z. Given a valuation p, a polynomial e 
evaluates to the integer e[p] by replacing every variable x with p(x). A valuation p 


is a root of the equation e = e’ if (e — e’)|[p] = 0. A valuation p is a root of the 
modular equation e = e’ mod [fo, f1,.--, fm] if (e — e’)[p] = zo folol + z1 file] + 
-+++ Zm fmlp] for some 20, 21,.--,; 2m € Z. A (modular) equation is an equation or a 


modular equation. A system of (modular) equations is a set of (modular) equations. A 
root of a system of (modular) equations is a common root of every (modular) equation 
in the system. Let ® be a system of (modular) equations and ¢ a (modular) equation, 
roots of & entail roots of ¢ (written Vx. = > 6) if all roots of @ are also roots of ¢. 
Given @ and @, the root entailment problem is to decide whether Vx.6 => 4. 

An ideal in Z[xX] generated by fo, fi,.--,fm © Z[X] is defined by 
(fos fise- fm) = {foho + fihi + + + fmhmlho, hi... hm E Ze}. If 
(fo, fis --- fm) and (go, 91,---, gn) are ideals, define their sum (fo, fi,.--, fm) + 
(90,91, ---;9n) = (fo, fis- --, fm» 90> 91, - - -» gn). For instance, (x) = {x f| f € Z[x]} 
and (6) + (10) = (2). Given f € Z[X] and an ideal J, the ideal membership problem is 
to decide whether f € T. 


332 M.-H. Tsai et al. 


A bit-vector is a bit sequence of a width w. A bit-vector denotes an integer between 
0 and 2 — 1 inclusively using the most-significant-bit-first representation. The SMT 
QF_BV logic defines bit-vector functions. Assume bvo and bv, are bit-vectors of width 
w. The addition (bvadd bvo bvı) and subtraction (bvsub bvo bv;) functions return bit- 
vectors of width w representing the sum and difference respectively. The multiplica- 
tion function (bvmul bvo bvı) returns the least significant w bits of the product. The 
left shift function (bvshl bvo n) shifts bvo to the left by n bits; the logical right shift 
function (bvishr bvo n) shifts bvo to the right by n bits. The zero extension func- 
tion (zero_extend bvo n) appends n most significant 0’s to bvo. The extraction func- 
tion (bvextract h l bvo) extracts bits indexed h to l from bvo (w > h > 1 > O). 
An SMT QF _BV expression is constructed from bit-vector values, variables, and func- 
tions. An SMT QF_BV assertion is of the form (assert L), (assert (= be be’)), or 
(assert (not (= be be’))) with SMT QF_BV expressions be and be’. An SMT QF_BV 
query is a set of SMT QF_BV assertions. A store is a mapping from bit-vector 
variables to bit-vector values. An SMT QF_BV expression evaluates to a bit-vector 
value on a store. An SMT QF BV assertion (assert (= be be’)) is satisfied by a 
store if be and be’ evaluate to the same bit-vector value on the store, and otherwise 
(assert (not (= be be’))) is satisfied. The SMT QF_BV assertion (assert L) is never 
satisfied. An SMT QF_BV query is satisfiable if all assertions are satisfied by a store. 


3 TOYLANG 


We consider a register transfer language called TOYLANG to illustrate algebraic 
abstraction. For clarity, many programming constructs are removed from TOYLANG. 
The language nevertheless is sufficiently expressive to implement Montgomery reduc- 
tion [19], an indispensable algorithm found in real-world cryptographic programs. 


3.1 Syntax and Semantics 


The syntax of TOYLANG is shown in Fig. 1. For simplicity, we assume all numbers are 
unsigned and all variables are of widths 1 or w. Variables of width 1 are also called bit 
variables. An atom is a number or a variable. 


Num n::=0|1|2|--- Var c,v::=a]|b|c|--- Atom a ::= Num | Var 
Inst s ::= v 4+ ADD aoa | v + ADCœaad | v + SUB ao ai | 
c: v + ADDS ao ai |c:v < ADCS ao aı d | c: v 4+ SUBS ao a | 
v 4+ MULaoaı | v + SHLaon | ASSUME q | 

vH : VL 4+ MULL aoaı | v < SHRaon | ASSERT q 


Exp e, f ::= n | v | eo + e1 | eo — e1 | eo x e1 | e0 
MEqn q ::= eo = e1 | eo = e1 mod [fo, fi,.. -] 
Program P := s | s P 


Fig. 1. TOYLANG — Syntax 
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TOYLANG supports several arithmetic instructions: addition (ADD), carrying addi- 
tion (ADDS), addition-with-carry (ADC), carrying addition-with-carry (ADCS), sub- 
traction (SUB), borrowing subtraction (SUBS), half- (MUL) and full-multiplication 
(MULL). Moreover, logical left shift (SHL) and logical right shift (SHR) instructions are 
allowed. In addition to assignments, (modular) equations can be specified in assumption 
(ASSUME) or assertion (ASSERT) instructions. A program is a sequence of instructions. 
We assume ASSERT instructions can only appear at the end of programs. They specify 
a (modular) equation to be verified and thus are emphasized with a framed box. 


bv = bvadd [ao]e [ail bv = bvadd (bvadd | ao] [a1].) (zero-extend |d]o (w — 1)) 


(jo, v + ADD ao ai, o[v + boj) (a,v + ADC ao ai d,a[v + bul) 


bux = bvadd (zero_extend [ao] 1) (zero-extend [ai], 1) 


(a,c: v + ADDS ao a1, a[c + bvextract w w bux, v ++ bvextract (w — 1) 0 bua]) 


bux = bvadd (bvadd (zero_extend [ao]o 1) (zero-extend a1]. 1)) (zero_extend |d]o w)) 


(a,c: v + ADCS ao a1 d, a[c + bvextract w w bux, v + bvextract (w — 1) 0 bvz]) 
bu = bysub [aolc [ai]o 


(jo, v + SUB ao ai, o[v + bv] 


bux = bvsub (zero_extend [ao] 1) (zero_extend [ai], 1) 


(0, c : v © SUBS ao a1, a[c + bvextract w w bux, v ++ bvextract (w — 1) 0 bvz))) 
bv = byshl [ao], n bv = bvlshr [ao], n bv = bymul [ao] [ai] 


(lo, v + SHL ao n, a[v + bul) (a, v + SHR ao n, a[v + bvl) (jo, v + MUL ao ai, o[v + bul) 


bu = bvmul (zero_extend [ao]o w) (zero_extend [ar]o w) 


(lo, vH : UL + MULL ao a1, o[vH +> bvextract (2w — 1) w bu, vz > bvextract (w — 1) 0 bv] ) 


Hg oFq oF (o,s,0"))  (o”,P,o') 
(jo, ASSUME q,a) (Jo,] ASSERT q|,o)) — (\o,| ASSERT q |, fail) (o,s P,o’) 


Fig. 2. TOYLANG — Semantics 


Let o be a store. We write a[v +> bu] for the store obtained by mapping v to the 
bit-vector bu and other variables u to a(w). [v]o represents the bit-vector o(v) for any 
variable v; otherwise, [n] o is the bit-vector representing the number n of width w. 

The semantics of TOYLANG is defined with SMT QF_BV bit-vector functions 
(Fig. 2). In the figure, (ø, s, o’) denotes that the store g’ is obtained after executing the 
instruction s on the store ø. The addition instruction ADD corresponds to the bit-vector 
addition function. For the addition with carry instruction, the carry bit is extended with 
w — 1 zeros and added to the sum of the first two operands. The two carrying addi- 
tion instructions compute the bit-vector sums of width w + 1. The most significant 
bit is stored in the output carry bit. Subtraction instructions are similar; their semantics 
are defined with the bit-vector subtraction function bvsub instead. The semantics of SHL 
and SHR instructions are defined by corresponding bit-vector functions bvshl and bvlshr 
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respectively. The semantics of half-multiplication instruction MUL uses the bit-vector 
multiplication function bvmul. For full-multiplication, both operands are extended to 
width 2w before computing their product. 


{njo =n {lu}. = toZ([v]_) 
{leo teifo = feoło + {erbo {leo x ero = fleofa < fleiha 
fleofc =fleiko deobe — deiho € (I fohe, Ifibe,---,Mfmbe) 


oF e=e1 a E eo = e1 mod [fo, fi,.--, fm] 


Fig. 3. Semantics of (Modular) Equations 


The ASSUME instruction filters computations by (modular) equations. Figure 3 
defines when a store satisfies a (modular) equation. A number n denotes a non-negative 
integer. A variable denotes the integer toZ([vu].,,) represented by the corresponding bit- 
vector [vu], in the store. Arithmetic operations denote corresponding integer operations. 
Particularly, the integer {Je|}, is exact and not necessarily less than 2”. Equality denotes 
integer equality. o satisfies eo = e1 mod [fo, fi,..-, fm] if {leo}o — {lero is in the 
ideal generated by { fobo, {filc,---;{.fmlto. The ASSERT instruction checks if the 
current store satisfies the given (modular) equation. The computation resumes if it suc- 
ceeds. It is an error if the ASSERT instruction fails. 


"R= 0<T < R**) (* T = 29 Ty + Ty *) 
(*N-N’+1=0Omod R*) ASSUME N x N’ +1=0 mod [2%] 
m + ((T mod R) - N’) mod R m < MUL Tr N’ 
t e (T+m- N)/R mNy :mNzr + MULL m N 
carry :tt < ADDS Tr MNL 
c:ty + ADCS Ty mNu carry 


ASSERT tz = 0 mod [2] 
ASSUME tz, = 0 


(*t- R=T mod N *) ASSERT (cx 2% + ty) x 2% = Ty x24 + Tr mod [N] 


(a) Algorithm (b) TOYLANG code 


Fig. 4. Simplified Montgomery Reduction 


Montgomery reduction algorithm is widely used to compute remainders without 
division [19]. Figure 4a shows a simplified unsigned Montgomery reduction algorithm. ! 
Suppose we want to compute the remainder of a number 0 < T < R? modulo N on 64- 
bit architectures with R = 2°*. Montgomery reduction algorithm needs another number 
N’ with NN’ +1 = 0 mod Ras an input. It first computes m = ((T mod R)N’) mod 
R and then t = (T + mN)/R. Observe that the remainder and quotient divided by 


' The complete algorithm requires range analysis not discussed in this work. 
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R = 2° amount to bit masking and shifting respectively. Arithmetic division is never 
used. To prove tR = T mod N, we first show T + mN = 0 mod R. Observe T + 
mN =T+(((T mod R)N’) mod R)N =T+TN'N =T(1+N'N) =0 mod R. 
Therefore, T + mN is a multiple of R and t = (T + mN)/R is an integer. Hence 
tR=T+mN =T mod N. 

In the TOYLANG implementation (Fig. 4b), we represent T by two 64-bit variables 
Ty and Ty, with T = 2°*T;;+Ty,. Hence Tr, = T mod 264, m is computed by the half- 
multiplication instruction MUL. The full-multiplication computes the product mN of m 
and N. The following two addition instructions compute the sum of T and the product 
mN. After adding T, the least significant 64 bits (tz) should be zeros. We hence assert 
tr = 0 mod [264]. If the assertion succeeds, tz, is in fact 0 since it is a 64-bit variable. 
We thus assume tz = 0. The last assertion checks that the result 24(2°*c + ty) is 
indeed congruent to T modulo N. 


4 Algebraic Abstraction 


Algebraic abstraction is a technique to lift computation to an algebraic domain. In the 
abstract algebraic domain, program instructions are transformed to polynomial equa- 
tions. Computation in turn is characterized by the roots of systems of polynomial equa- 
tions. Algebraic abstraction hence allows us to apply algebraic tools from commutative 
algebra. The abstraction technique requires programs in the static single assignment 
form. We hence assume input programs are in the static single assignment form. 


[v + ADD ao ai] = {v = ao +aı} [v + ADC ao ai d| = {v = ao + a1 + d} 
[c : v < ADDS ao a1 | = {c- (c — 1) = 0,c- 2" +v = ao + a1} 
[c : v + ADCS ao ay d| = {c- (c — 1) =0,c- 2% +v = ao +aı +d} 


[v < SUB ao a1 | = {v = ao — ai} [v < MUL ao ai] = {v = ao - ai} 


[c : v 4+ SUBS ao a1 | = {c - (c — 1) = 0, v = ao — ar +c- 2" } 
[vH : v 4+ MULL ao a1 | = {vy - 2” + vL = ao a1} 
[v + sHLan]| = {v =a- 2"} [v + SHRan] = {v-2" =a} 
[assume q] = {4} [s P] = [s] U [P] 


Fig. 5. Algebraic Abstraction 


Figure 5 lifts TOYLANG instructions to polynomial equations. Intuitively, we would 
like the semantics of each instruction characterized by roots of corresponding polyno- 
mial equations. For instance, v — ADD ao a; is lifted to v = ag + aı. The ADC 
instruction is similar. The carrying addition instruction c : v — ADDS apo ay is lifted to 
two equations: c- (c — 1) = 0 and c - 2” + v = ap + ay. Since cis a carry, it must be 
0 or 1, and hence a root of c- (c — 1) = 0. The carrying addition-with-carry instruction 
ADCS is similar, as well as subtraction instructions SUB and SUBS. 

The half-multiplication instruction v — MUL apo a; is lifted to v = ao - a1; the full- 
multiplication instruction vy : vg — MULL ao aj corresponds to vy -2" +vzr = ao-ay. 
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ASSUME N x N’ +1 = 0 mod [2%] N x N' +1 = 0 mod [264] 
m + MUL Tz N’ m=T,-N', 
mNy : mN, + MULL™m N mNy -2°*+mNz, =m-N, 


carry : (carry — 1) = 0, 
carry - 2° +t, =Tr+mNz, 
c: (c—1)=0, 

c- 284 4 ty = Tyu +mNyH + carry, 
ASSERT tz = 0 mod [2%] 
ASSUME tz = 0 tt =0 


ASSERT (c x 2° + ty) x 284 = Ty x 2° + Ty mod [N] 


carry :t < ADDS Tr MNL 


c:ty + ADCS Ty mNu carry 


Fig. 6. Abstract Montgomery Reduction 


The logical left shift instruction v — SHL a n corresponds to v = a- 2”; the logical 
right shift instruction v — SHR a n is lifted to v- 2” = a. The ASSUME q instruction is 
lifted to the (modular) equation q. All computations thus must satisfy q. A TOYLANG 
program is lifted to the system of (modular) equations from its instructions. The system 
of (modular) equations is called the abstract polynomial program. Figure 6 shows the 
abstract polynomial program for the Montgomery reduction program. 


4.1 Soundness Conditions 


Algebraic abstraction in Fig. 5 however is unsound. The TOYLANG semantics is defined 
over bounded integers of bit width w. Polynomial equations in algebraic abstraction are 
interpreted over integers. When overflow occurs in TOYLANG instructions, for instance, 
its computation is not captured by corresponding polynomial equations. Consider the 
instruction v — ADD 2”—! 2W-1, By the TOYLANG semantics, v has the bit-vector 
value bvadd [2”~'],, [2”~"].,, = 0 after execution. Clearly, 0 is not a root of the equa- 
tion v = 2”~! + 2-1, The abstraction is unsound. 

In order to check soundness for algebraic abstraction, we define soundness condi- 
tions for TOYLANG instructions to ensure that all computations are captured by cor- 
responding polynomial equations. Intuitively, we give an SMT QF_BV query for each 
instruction in a TOYLANG program such that the query is satisfiable if and only if the 
computation at the instruction can overflow. 

To this end, we first use SMT QF_BV logic to characterize computations in TOY- 
LANG programs. Recall TOYLANG programs are in the static single assignment form. 
Figure 7 defines an SMT QF BV query |P] for any TOYLANG program P. Except 
the ASSUME instruction, the figure follows the semantics of TOYLANG. For instance, 
|v — ADC ap ay d] asserts v equal to the bit-vector sum of ag and a; with d extended 
by w — 1 zeros in the SMT QF_BV query. Others are similar. It is not hard to see that all 
computations of a TOYLANG program satisfy the corresponding SMT QF_BV query. 


Lemma 1. Let P be a TOYLANG program without ASSERT instructions and o, 0o' 
stores with (|o, P,o’|). Then the SMT QF BV query | P| is satisfied by the store o’. 
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|v + ADD ao ai| = {(assert (= v (bvadd ao a1)))} 
|v + ADC ao a; d| = { (assert v (bvadd (bvadd ao a1) (zero-extend d (w — 1)))))} 
; , _ J (assert (= c (bvextract w w bvz))), 
[epu ADDS apan] = i on a = v (bvextract (w — 1) 0 bva))) \ 
where bux is (bvadd (zero_extend ao 1) (zero_extend a 1)) 
- assert (= c (bvextract w w bvz))), 
(oit ADOS ag är dl = { eee z v oii (w — 1) as \ 
where bux is (bvadd (bvadd (zero_extend ao 1) (zero_extend a, 1)) 
(zero_extend d w)) 
|v + SUB ao ai| = { (assert (= v (bvsub ao a1)))} 
(assert (= c (bvextract w w bvz))), 
{ (assert (= v (bvextract (w — 1) 0 bvz))) \ 
where bux is (bvsub (zero_extend ao 1) (zero_extend a; 1)) 
|v < MUL ao a| = {(assert (= v (bvmul ao a1)))} 
(assert (= vy (bvextract (2w — 1) w bvz))), 
(assert (= vz (bvextract (w — 1) 0 bvx))) \ 
where bux is (bvmul (zero_extend ao w) (zero_extend a, w)) 
(assert (= v (bvshl ao n)))} 


{ 
{ (assert (= v (bvlshr ao n)))} 
0 
L 


|c : v © SUBS ao a1 | 


lva : VUL + MULL ao aı] = { 


|v + SHL ao n] 
|v + SHR ao n] 
| ASSUME q] 

Ls P] 


s] U |P] 


Fig. 7. Soundness Conditions I 


Our next task is to define SMT QF_BV queries for instructions such that their alge- 
braic abstraction is unsound if and only if the corresponding SMT QF_BV query is 
satisfiable (Fig. 8). The instruction v — ADD ap a; is lifted to v = ag+ay. The abstrac- 
tion is unsound when there is carry. That is, (bvextract w w (bvadd (zero_extend ag 1) 
(zero_extend a; 1))) is 1. The instructions ADC and SUB are similar. Algebraic abstrac- 
tion for the instructions ADDS, ADCS and SUBS is always sound. Their correspond- 
ing SMT QF_BV queries are not satisfiable (assert L). For the half-multiplication 
v <— MUL aọ Qj, its abstraction v = ado - a; is unsound when the most signifi- 
cant w bits of the product of ap and aj, are not all zeros. The corresponding SMT 
QF BV query is hence (assert (not (= 0 (bvextract (2w — 1) w bvx)))) where bux 
is the bit-vector product of ap and a1. The abstraction for full-multiplication instruc- 
tion is never unsound. For the v — SHL ao n instruction, its algebraic abstraction is 
unsound if the most significant n bits of ag are not zeros. The algebraic abstraction of 
the v SHR apg n instruction is unsound when the least significant n bits of ap are not 
zeros. Relevant bits are obtained by bvextract respectively. The abstraction for ASSUME 
is always sound. 

To check soundness of the algebraic abstraction [s] for the instruction s in the 
TOYLANG program P s, we apply Lemma | to obtain a computation of P through | P] 
and check if |s] for s is unsatisfiable. We say the soundness condition for the instruction 
s in the TOYLANG program P s holds if |P s| is unsatisfiable. In order to ensure 
the soundness of the abstract polynomial program | P] for the TOYLANG program P, 
soundness conditions for all instructions in P must hold. That is, soundness conditions 
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|v < ADD ao aı | = {(assert (= 1 (bvextract w w bux)))} 
where bux is (bvadd (zero_extend ao 1) (zero_extend a 1)) 
|v + ADC ao aı d| = {(assert (= 1 (bvextract w w buax)))} 
where bux is (bvadd (bvadd (zero_extend ao 1) (zero_extend a, 1)) 
(zero_extend d w) 
|c : v + ADDS ao a1 | = { (assert L)} 
|c : v + ADCS ao a; d| = { (assert L)} 
|v < SUB ao a1 | = { (assert (= 1 (bvextract w w bvx)))} 
where bux is (bvsub (zero_extend ao 1) (zero_extend a, 1)) 
|c : v + SUBS ao ai| = {(assert L)} 
|v + MUL ao ai| = {(assert (not (= 0 (bvextract (2w — 1) w bvx))))} 


where bux is (bvmul (zero_extend ag w) (zero_extend a; w)) 


lv : UL 4+ MULL ao ai | = {(assert L)} 

[v + SHL ao n| = {(assert (not (= 0 (bvextract (w — 1) (w — n) ao))))} 
|v + SHR ao n| = {(assert (not (= 0 (bvextract (n — 1) 0 ao))))} 
{ 
L 


| ASSUME q| = { (assert L)} 
LP s] = [P] ù Ls] 


Fig. 8. Soundness Conditions II 


for s in all prefixes P’ s of P must hold. Define the valuation p, of the store o by 
polv) = toZ([v].) for every v € X. The next theorem gives the soundness condition. 


Proposition 1 (Soundness). Let P be a TOYLANG program without ASSERT instruc- 
tions and o, 0’ stores with (o, P,a'|). po is a root of the system of (modular) equations 
|P] if soundness conditions for s in every prefix P' s of P hold. 


We say that the soundness condition for P holds if soundness conditions for s in all 
prefixes P’ s of P hold. Let us take a closer look at the abstract Montgomery reduc- 
tion program (Fig. 6). The half-multiplication instruction m + MUL Ty, N’ is lifted to 
m = Ty - N’. However, the soundness condition for the instruction requires the most 
significant 64 bits of the product to be zeros (Fig. 8). Since Tz, is arbitrary, the sound- 
ness condition does not hold in general. To obtain a sound algebraic abstraction for 
Montgomery reduction, we modify the TOYLANG program slightly (Fig. 9). 

In the revised program, the first full-multiplication instruction is used to compute 
the least significant 64 bits of the product of Tz and N’ (marked by ,/). The most 
significant 64 bits of the product are stored in the variable dc (for don’t care). Note that 
the soundness condition of the revised program holds trivially. The algebraic abstraction 
for the revised Montgomery reduction program is sound by Proposition 1. 


4.2 Polynomial Program Verification 


Let P be a TOYLANG program without ASSERT instructions. Our goal is to verify 
P ASSERT @ | with algebraic abstraction. Consider the system of (modular) equations 


® = |P]. For any stores o and o’ with (ø, P,o’)), po is a root of & if the soundness 
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ASSUME N x N’ +1 = 0 mod [264] N x N' +1 = 0 mod [264] 
VY de:m + MULLT: N’ de- 2t +m = Tr- N', 
mNy : MN; + MULL mMm N mNy -2 +mNr =m-N, 


carry : (carry — 1) = 0, 
carry 2% +t, = Tr +mMNL, 
c- (c—1)=0, 

c- 264 4 ty = Tyu +mNyH + carry, 


carry :t << ADDS Tre MNL 


c: tH + ADCS Ty mMNu carry 


ASSERT tz = 0 mod [2%] 
ASSUME tz = 0 t,t =0 
ASSERT (c x 2% + ty) x 2° = Ty x 2°4 + Tr mod [N] 


Fig. 9. Abstract Montgomery Reduction (Revised) 


condition for P holds by Proposition 1. To verify ASSERT ¢ on øg’, we need to check if 
Po: is also a root of the (modular) equation ¢. That is, we want to show if Vx. => ¢. 


Proposition 2. Let P be a TOYLANG program without ASSERT instructions and ġ a 
(modular) equation. Suppose the soundness condition for P holds. The assertion in 
P ASSERT ¢| succeeds if YX.| P| = > 4. 


We extend [14] to check the root entailment problem. Recall that ® is a system of 
(modular) equations. We first simplify it to a system of equations. This is best seen by 
an example. Consider Vz y u v.x = y mod [3u?,u+v] => 0 = 0. We have 

Vr y u v.x = y mod [3u?,u+v] => 0=0 
iff Vx y u v.[3ko ki(a — y = 3u? - ko + (u +v): kı) = 0=0 
iff V2 y u v ko ki-x — y = 3u? - ko + (u +v): kı => 0=0. 


Therefore, it suffices to consider the problem of checking Vx.Y¥ => where ¥ isa 
system of equations and ¢ is a (modular) equation. We solve the simplified problem by 
constructing instances of the ideal membership problem. 

Let YW = {eo = €b, €1 = €}; ---; €n = e }. Consider the ideal I = (eo — ep, €1 — 
el,- -€n — €) generated by the polynomial equations in W. Suppose the polynomial 
e—e’ € I. WeclaimVxW = > e = é’. Indeed, e — e’ = (eo — &) - ho + (e1 — 
e1): hy +--+ + (en — e) - hn for some ho, hi,...,4n € Z[X] since e — e’ € I. 
For any root p of Y, (eo — eb)lo] = (e1 — e4)[e] = -++ = (en — e)l] = 0. Hence 
(e — e")[p] = ((eo — eh) - ho) lp] + ((ex — e1) - hi)lo] +--+ (len — €h) « hn) |p] = 0. 
pis also a root of e — e’ = 0 and thus Vx.7 => e=. 

Now suppose the polynomial e — e’ € I + (fo, fi,..-, fm). We claim YX.” => 
e = e mod [fo, fi,.--, fm]. Since e — e € I+ (fo, fi,.--,fm),e—e = (eo — 
eb): ho + (e1 — e1) hr +- + (en — €i) hn + fo: ko + fi- ki +: + fim + kim 
for some ho, hi,...,n, ko, k1,--.;km €E Z[X]. For any root p of Y, (e — e’)[p] = 
(Ceo — eh) hojlo] + (er — 4) Alo] +--+ (en — eh) lel + fo ole] + fi - 
kilo] +--+ + fm: kmle] = 0+ folelkolo] + filelkile] +--+ + fmlolkmlol. We again 
have Vx.Y = > e=e' mod (fo, fi,..-, fm] as required. 

Our discussion is summarized as follows. 
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e=e' ~ (e—e') 


e= e' mod [fo, fi,- --, fm] ~ (e — e' — fo- ko — fi -kı —-++ — fm: km) 
ko, ki,...,km : fresh variables 
onl ~J 


Ø= (0) {o}USHT4J 


Fig. 10. Polynomial Programs to Ideals 


Proposition 3. Let P be a TOYLANG program without ASSERT instructions and I the 
ideal with | P| ~> I (Fig. 10). Then 


1. Vx.[P] = e= e ife-—e el; 
2: Vx.| P] => e= e mod (fo, fi,--->fm] ife — e E€ I+ (fo, fi,---; fm). 


In order to verify (modular) equations with algebraic abstraction, Proposition | is 
applied to ensure the soundness of abstraction. Proposition 3 then checks whether (mod- 
ular) equations indeed are satisfied for abstract polynomial programs. The main theorem 
summarizes our theoretical developments. 


Theorem 1. Let P be a TOYLANG program without ASSERT instructions, 0,0’ stores 
with (lo, P, o')) and I the ideal with [| P| ~~ I. If the soundness condition for P holds, 


1. the assertion in| P ASSERT e = é’ | succeeds provided e — e € I; 


2. the assertion in| P ASSERT e = e’ mod [fo, f1,---, fm] | succeeds provided e—e! € 


I+ (Sogfisestsfm) 


The ideal membership problem can be solved by computing Gröbner bases for ide- 
als [7]. Many computer algebra systems compute Gröbner bases for ideals with simple 
commands. For instance, the groebner command in SINGULAR [13] computes a 
Gröbner basis for any ideal by a user-specified monomial ordering. The reduce com- 
mand then checks if a polynomial belongs to the ideal via its Grobner basis. 

Recall the abstract polynomial program for revised Montgomery reduction in Fig. 9. 
Figure | 1a shows the ideal for the abstract polynomial program before ASSUME tz = 0. 
To verify the two ASSERT instructions, Figs. 11b and 11c show the instances of the 
ideal membership problem corresponding to the two assertions. Observe the ideal (tz) 
corresponds to ASSUME tz = 0 in Fig. 11c. Since the soundness condition for the 
abstract polynomial program holds trivially (Sect. 4.1), it remains to check the ideal 
membership problem. Both instances are verified immediately. 


5 Certified Verification 


In TOYLANG, we only highlight necessary instructions to verify unsigned Montgomery 
reduction. For real-world programs performing non-linear computation, more instruc- 
tions are needed and the signed representation of bit-vectors is also used. In order to ver- 
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carry : (carry — 1), carry -2°* +t, — (Tr +mNrz),c:-(c—1), 


(> ary tarry terry D aiao Chemie) 
I= 
c- 2% 4+ ty — (Tu +mNzu + carry) 


(a) Ideal 7 


t E€ I + (2%) 


(b)| ASSERT tz = 0 mod [2%] 


(c-2°* + ty) 2°% — (TH -2°* + Tr) € I + (tr) + (N) 


(c)| ASSERT (c x 2°4 + ty) x R= Ty x 2° + Tr mod [N] 


Fig. 11. Instances of Ideal Membership Problem 


ify real-world cryptographic programs, we extend algebraic abstraction with these fea- 
tures found in CRYPTOLINE [9, 29]. For such complicated languages, algebraic abstrac- 
tion can be tedious to implement. Its verification algorithm moreover relies on com- 
plex algorithms from computer algebra systems and SMT QF_BV solvers. It is unclear 
whether these external tools function correctly on given instances. In order to improve 
the quality of verification results, we have verified algebraic abstraction with the proof 
assistant Coq, and certified results from external tools with COQ and a verified certifi- 
cate checker. We briefly describe how to verify our algorithms and certify results from 
external tools. Please see the technical report [28] for details. 


5.1 Verified Abstraction Algorithm 


The proof assistant COQ with the SSREFLECT library [4,11] is used to verify our 
algebraic abstraction technique. We define the TOYLANG syntax as a COQ data type 
(Fig. 1). The COQ-NBITS theory [26] is adopted to formalize the semantics of TOY- 
LANG (Fig. 2). The CoQ binary integer theory Z is used to formalize the semantics of 
(modular) equations (Fig. 3). We formalize polynomial expressions with integral coef- 
ficients by the COQ polynomial expression theory PExpr Z. 

To see how our algebraic abstraction algorithm is verified, consider Proposition 2. 
Let program be the COQ data type for TOYLANG programs and meqn the data type 
for (modular) equations. We define the predicate algsnd : program — Prop for the 
soundness condition for a given program (Figs. 7 and 8). Similarly, we define the func- 
tion algabs : program — seq megn for our algebraic abstraction algorithm where 
seq meqn is the COQ data type for sequences of meqn (Fig. 5). To write down the 
formal statement for Proposition 2, it remains to formalize the root entailment. Let exp 
and valuation be the data types for expressions and valuations respectively. Define the 
function eval_exp : exp — valuation — Z which evaluates an expression to an integer 
on a valuation; and eval_exps : seq exp — valuation — seq Z evaluates expressions 
to integers on a valuation. Consider the predicate eval_bexp : meqn — valuation — 
Prop defined by 


342 M.-H. Tsai et al. 


eval_bexp (e = e’) rho := eval_exp e rho = eval_exp e’ rho 
eval_bexp (e = e’ mod fs) rho := 3ks, (eval_exp e rho) - (eval_exp € rho) = 
zadds (zmuls ks (eval_exps fs rho)) 


where zadds zs := foldl Z.add O zs and zmuls xs ys := map2 Z.mul xs ys. The 
predicate eval_bexp (e = e’) rho checks if the expressions e and e’ evaluate to the same 
integer on the valuation rho; eval_bexp (e = e’ mod fs) rho checks if the difference 
of eval_exp e rho and eval_exp e’ rho is equal to a linear combination of the integers 
eval_exps fs rho. The predicate eval_bexp meq rho thus checks if rho is a root of the 
(modular) equation meq. 

We are ready to formalize the root entailment. Consider the predicate entails (Phi 
: seq meqn) (psi : meqn) : Prop defined by 


Yrho, (Vphi, phi € Phi — eval_bexp phi rho) — eval_bexp psi rho. 


That is, every common root of the system Phi is also a root of psi. The following 
proposition formalizes Proposition 2 and is proved in COQ. 

Proposition 4. Let P : program be without assert instructions and psi : meqn. If 
algsnd P and entails (algabs P) psi, then the assertion in| P assert psi | succeeds. 


To apply this proposition to a given program P and a (modular) equation psi, one 
needs to show algsnd P and entails (algabs P) psi in COQ. In principle, both predi- 
cates algsnd P and entails (algabs P) psi could be proved manually in CoQ. However, 
it would be impractical even for programs of moderate sizes. To address this problem, 
we establish these predicates through certificates computed by external tools. 


5.2 Verification through Certification 


To show algsnd P for an arbitrary program P, we follow the certified verification 
technique developed in the SMT QF_BV solver COQQFBV [26]. More concretely, 
we specify our bit-blasting algorithm for soundness conditions in COQ (Figs. 7 and 8). 
The algorithm converts soundness conditions to Boolean formulae in the conjunctive 
normal form. We then formally verify that soundness conditions hold if and only if the 
corresponding Boolean formulae are unsatisfiable in COQ. The constructed Boolean 
formulae are sent to the SAT solver KISSAT [5]. For each Boolean formula, KISSAT 
checks its satisfiability with a certificate. We then use the verified certificate checker 
GRATCHK [16] to validate these certificates. 

Our next goal is to show entails (algabs P) psi. More generally, we show entails 
Phi psi with arbitrary Phi : seq meqn and psi : meqn via the COQ polynomial ring 
theory and the computer algebra system SINGULAR [13]. To this end, we first formu- 
late the root entailment of polynomial expressions in the COQ polynomial ring theory. 
Recall PExpr Z is the COQ data type for polynomial expressions with integral coef- 
ficients. Given integers, the function zpeval : PExpr Z — seq Z — Z evaluates a 
polynomial expression to an integer. We formalize the root entailment of polynomial 
expressions by the predicate zpentails (Pi : seq (PExpr Z)) (tau : PExpr Z): 


vzs, (Ypi, pi € Pi — zpeval pi zs = 0) — zpeval tau zs = 0. 
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We proceed to connect the root entailment of (modular) equations to the root entail- 
ment of polynomial expressions. Let the functions zpexprs_of_exprs : seq expr — 
seq (PExpr Z) and zpexprs_of_meqns : seq meqn — seq (PExpr Z) convert expres- 
sions and (modular) equations to polynomial expressions respectively (Fig. 10). When 
the consequence of root entailment is a modular equation, recall that moduli in the 
consequence become ideal generators (Proposition 3). To extract moduli from conse- 
quences, define zpexpr_of_conseq : meqn — PExpr Z x seq (PExpr Z) by 


zpexpr_of_conseq (e = e’) := (e - e’, [::]) 
zpexpr_of_conseq (e = e’ mod fs) := (e - e’, zpexprs_of_exprs fs) 


The following COQ lemma shows how to check the root entailment of (modular) equa- 
tions through the root entailment of polynomial expressions: 


Lemma 2. V (Phi : seq meqn) (psi : meqn), zpentails (Pi ++ zpexprs_of_meqns 
Phi) tau implies entails Phi psi where (tau, Pi) = zpexpr_of_conseq psi. 


Note that moduli in the consequence psi are added to the antecedents Phi. 

Our last step is to show zpentails (Pi ++ zpexprs_of_meqns Phi) tau. Again, 
we establish the generalized form zpentails Pi tau for polynomial expressions Pi 
and a polynomial expression tau. We prove the predicate by showing that tau can 
be expressed as a combination of expressions in Pi. Consider the predicate vali- 
date_zpentails (Xi : seq (PExpr Z)) (Pi : seq (PExpr Z)) (tau : PExpr Z) defined 
by 


size Xi = size Pi ^ 
ZPeq (ZPnorm tau) (ZPnorm (fold! ZPadd 0 (map2 ZPmul Xi Pi))). 


The predicate validate_zpentails checks if the Xi and Pi are of the same size. It 
then normalizes the polynomials tau and foldi ZPadd 0 (map2 ZPmul Xi Pi) using 
ZPnorm. If normalized polynomials are equal (ZPeq), the predicate is true. In foldl 
ZPadd 0 (map2 ZPmul Xi Pi), ZPadd and ZPmul are the constructors for polyno- 
mial expression addition and multiplication respectively. The expression map2 ZPmul 
Xi Pi hence returns products of elements in Xi with corresponding elements in Pi. The 
expression foldl ZPadd 0 (map2 ZPmul Xi Pi) then computes the sum of these prod- 
ucts. The predicate validate_zpentails Xi Pi tau therefore checks if tau is equal to a 
polynomial combination of expressions in Pi. In other words, tau belongs to the ideal 
generated by Pi. Using Lemma 2, we prove the following variant of Proposition 3 in 
Coq: 


Proposition 5. V Phi psi Xi, validate_zpentails Xi (Pi ++ zpexprs_of_meqns Phi) 
tau implies entails Phi psi where (tau, Pi) = zpexpr_of_conseq psi. 


The main difference between Propositions 3 and 5 lies in certifiability. There are 
many ways to establish ideal membership. Proposition 5 asks for witnesses Xi to jus- 
tify ideal membership explicitly. Most importantly, such Xi need not be constructed 
manually. They are in fact computed by external tools. Precisely, these polynomial 
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expressions are computed by the 1i£t command in the computer algebra system SIN- 
GULAR [13]. The Lift command computes polynomial expressions representing tau 
in the ideal generated by Pi ++ zpexprs_of_meqns Phi. After SINGULAR computes 
these polynomial expressions, we convert them to polynomial expressions Xi in CoQ. 
The predicate validate_zpentails Xi (Pi ++ zpexprs_of_meqns Phi) tau checks if 
tau is indeed represented by Xi using the COQ polynomial ring theory. If the check 
succeeds, we obtain entails Phi psi by Proposition 5. Otherwise, the predicate entails 
Phi psi is not established. Note that SINGULAR need not be trusted. If Xi is computed 
incorrectly, the check validate_zpentails Xi (Pi ++ zpexprs_of_meqns Phi) tau will 
fail in COQ. Proposition 5 allows us to show entails Phi psi with certification. 


5.3 Optimization 


Lots of optimizations are needed and verified to make algebraic abstraction feasible 
for TOYLANG programs with thousands of instructions. For instance, the static sin- 
gle assignment transformation and program slicing algorithms are both specified and 
verified in COQ. Furthermore, the bit blasting algorithm is extended significantly to 
check soundness conditions effectively. For example, the soundness condition for the 
half-multiplication instruction MUL requires bvmul (Fig. 8). This could not work well 
because of complicated non-linear bit-vector computation. To reduce the complexity 
of overflow checking in half-multiplication, we implement and verify the algorithm 
from [10]. Last but not least, algebraic abstraction almost surely induces ideals with 
hundreds of polynomial generators if not thousands. Computing Grébner bases for such 
ideals is infeasible. To address this problem, we develop heuristics to reduce the number 
of generators in ideals through rewriting. Our heuristics are also specified and verified 
in Coq. These optimizations are essential in our experiments. 


6 Evaluation 


We have implemented certified algebraic abstraction in the tool COQCRYPTOLINE [1]. 
COQCRYPTOLINE is built upon OCAML codes extracted from our COQ development. 
It calls the computer algebra system SINGULAR [13] and certifies answers from the 
algebraic tool. The certified SMT QF_BV solver COQQFBV [26] is used to verify 
soundness conditions. We choose two classes of real-world cryptographic programs in 
experiments. For elliptic curve cryptography, we verify various field or group operations 
from BITCOIN [27], BORINGSSL [8,12], NSS [20], OPENSSL [23], and PQCRYPTO- 
SIDH [18]. For post-quantum cryptography, we verify the C reference and optimized 
Intel avx2 implementations of the Number-Theoretic Transform in the cryptosystem 
KYBER [6]. Experiments are conducted on an Ubuntu 22.04.1 Linux server with 
3.20 GHz 32-core Xeon Gold 6134M and 1TB RAM. 

We compare COQCRYPTOLINE with the uncertified CRYPTOLINE [9,24]. Table | 
shows the experimental results. Loz, shows the number of instructions. Tecz and Tor 
give the verification time of COQCRYPTOLINE and CRYPTOLINE in seconds respec- 
tively. %rnz_ shows the percentage of time spent in extracted OCAML programs in 
CoQCRYPTOLINE. %cas and %smr give the percentages of time spent on SINGU- 
LAR and COQQFBV respectively. 
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Table 1. Experimental Results on Industrial Cryptographic Programs 


Function Lor | %me|%sur |%cas |Tocr |Tor || Function | Lor | %me|%sur |%cas Toci Tor 

bitcoin/asm/secp256k1_fe_* 
mul_inner | 167 |0.13 | 99.52 0.34 91.96 |2.41 sqr_inner 151 |0.28 | 99.13 0.59 28.30 1.17 
bitcoin/field/secp256k1 _fe_* 
mul_inner | 132 |0.09 | 98.81 1.11 58.34 |1.44 || mul_int 6 0.14 | 95.21 4.65 1.17 0.02 


negate 10 0.37 | 95.60 | 4.04 0.61 0.02 || sqr_inner 119 |0.12 | 98.60 1.28 34.08 | 0.91 
bitcoin/group/ 

secp256k1_ge_neg 31 1.82 | 90.48 | 7.70 0.24 0.03 
secp256k 1 _gej_double_var.part.14 948 |0.53 |98.93 | 0.54 1091.28 | 25.50 
bitcoin/scalar/secp256k1_scalar_* 

mul 918 |1.19 98.26 | 0.54 167.97 | 6.28 || mul_512 338 |0.50 | 98.51 0.98 36.97 | 2.20 
sqr 929 |1.49 97.81 0.70 147.07 | 5.41 || sqr512 349 |0.66 | 98.10 1.23 2745 | 3.11 
secp256k1 _scalar_reduce 104 | 2.50 (91.18 | 6.32 1.21 0.09 
secp256k1 -scalar_reduce_512 580 |1.62 |97.50 | 0.88 47.83 1.88 


boringssl/fiat_curve25519/fe_* 
mul_impl |114 |0.04 | 99.67 | 0.29 70.85 | 1.65 || sqrimpl 96 0.09 | 99.38 | 0.53 25.30 | 0.75 
fe_mul121666 54 1.31 | 95.61 3.08 0.84 0.07 
x25519-scalar-mult-generic™ 1068 | 0.27 |99.55 | 0.18 019.43 | 279.95 
boringssI/fiat_curve25519_x86/fe_* 
mulimpl |375 |0.38 | 99.28 | 0.34 81.67 | 1.79 || sqr_impl 299 |0.52 | 99.08 | 0.40 39.89 | 0.97 


fe_mul121666 96 1.96 | 95.02 | 3.02 07 0.08 
x25519_scalar_mult_generic™ 3287 |0.45 |99.40  |0.15 4454.87 | 240.00 
nss/Hacl_Curve25519_51/ 

fmul0 127 |0.03 |99.67 | 0.30 136.53 | 31.11 || fmull 67 0.09 | 98.85 1.06 2.65 | 0.26 
fsqr0 98 0.03 | 99.64 | 0.33 75.10 | 2.90 || fsqr20 196 |0.06 |99.55 | 0.38 05.24 | 3.15 
fmul20 238 |0.06 | 99.65 | 0.29 200.54 | 35.29 
point_add_and_double® 1165 |0.13 99.65 | 0.22 2611.51 | 355.34 
point_double 582 |0.17 |99.49 | 0.35 975.02 | 17.06 
openssl/curve25519/fe51_* 

mul 111 |0.06 |99.66 | 0.28 57.91 | 1.20 || sq 93 0.08 | 99.34 | 0.58 23.06 | 0.69 
fe51_mul121666 55 1.27 |95.95 | 2.78 0.70 0.07 
x25519_scalar_mult® 1042 |0.29 |99.54 | 0.17 912.24 | 281.26 


PQCrypto-SIDH/P434/x86_64/ 
fpmul434 | 266 | 91.74 | 0.02 8.24 0.39 0.05 || fp2mul434 |1161 |1.10 | 98.62 | 0.29 726.40 | 42.44 
PQCrypto-SIDH/P503/arm64/ 


fpmul 553 | 2.43 | 96.19 1.39 249.24 |5.49 || fpmul-fixed | 554 | 2.39 | 95.75 1.86 250.41 | 5.46 
PQClean/kyber/NTT 

PQCLEAN-KYBERS12-CLEAN_ntt 6273 |4.78 | 34.21 61.01 | 1113.92 | 46.54 
PQCLEAN-_KYBER768_AVX2_ntt 8975 |5.41 | 83.63 10.96 | 433.31 | 29.63 


* One (out of three) modular polynomial equation in post-conditions fails to certify due to stack overflow. 


6.1 Field and Group Operation in Elliptic Curves 


In elliptic curve cryptography, a rational point on a curve is represented by field ele- 
ments from a large finite field. Rational points on the curve form a group. The group 
operation in turn is computed by operations in the underlying finite field. In BITCOIN, 
the finite field is Zp256x1 with p256k1 = 275° — 282 — 29 — 28 — 27 — 26 — 24 — 1. 
The underlying field for Curve25519 is Zp25519 with p25519 = 2°55 — 19. PQCRYPTO- 
SIDH however uses slightly more complicated fields Zp434/(v?+1) and Zps03/(£°+1) 
with p434 = 2716. 3187 — 1 and p503 = 2°50 . 3159 — 1, Field elements in Zp256x1 and 
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Zp25519 are represented by multiple limbs of 64-bit numbers. Field multiplication, for 
instance, is implemented by a number of 64-bit arithmetic instructions. Field elements 
in Zpaza/(x? + 1) and Zps503/(x? + 1) are of the form u + vx where u,v € Zp434 
or Zps03 and x” = —1. Two moduli are used to specify multiplication for such fields: 
p434, x? + 1 for Zp434/ (x? + 1), and p503, x? + 1 for Zps03/(a? + 1). Multiplication 
of PQCRYPTO-SIDH is easily specified by modular equations with multiple moduli. 

COQCRYPTOLINE verifies every field operation with certification within 12.1 min. 
Group operations are implemented by field operations. Their certified verification thus 
takes more time. The most complicated case x25519_scalar_mult_generic (3287 instruc- 
tions) from BORINGSSL takes about 1.3 h.* In comparison, CRYPTOLINE verifies 
the same program in 4 min without certification. In almost all cases, a majority of 
time is spent on COQQFBV. Running time for extracted OCAML programs is neg- 
ligible. Interestingly, COQCRYPTOLINE finds a bug in the arm64 multiplication code 
for Zps03/(x? + 1) from PQCRYPTO-SIDH. Towards the end of multiplication, the 
programmer incorrectly stores the register x25 in memory before adding a carry. After 
fixing the bug, COQCRYPTOLINE finishes certified verification in about 5 min. 


6.2 Number-Theoretic Transform in KYBER 


The United States National Institute of Standards and Technology (NIST) is cur- 
rently determining next-generation post-quantum cryptography (PQC) standards. In 
July 2022, Crystals-K YBER (or simply KYBER) was announced to be the winner for 
key establishment mechanisms. 

One of the most critical steps in KYBER is modular polynomial multiplication over 
the polynomial ring Ra = Z,[2]/(x2?°° + 1) with q = 3329. In R4, coefficients are 
elements in the field Z,. A polynomial in R4 is obtained by modulo z? + 1 and hence 
has a degree less than 256. Consider x?°° € Z,[z]. Since x% = —1 mod (x?°° + 
1), 2° is —1 in Ry. Unsurprisingly, polynomial multiplication is one of the most 
expensive computations in KYBER. An efficient way to multiply polynomials is through 
a discretized Fast Fourier Transform called the Number-Theoretic Transform (NTT). 

Recall the Chinese remainder theorem for integers is but a ring isomorphism 
between residue systems. For instance, Z42 = Ze x Zy. For polynomial rings, we 
have the following ring isomorphism 


Zale] /(x°" — w?) S Zq[x]/(a” — w) x Zale] /(2" +w) (w € Za). 


Observe that z” is equal to w in Z,[a]/(a” — w) fora” =w mod (#”—w). Similarly, 
x” is equal to —w in Z4[æ]/(<” + w). Recall polynomials in Z,[x]/(x?" — w?) have 
degrees less than 2n. We can rewrite any polynomial in Z,[x]/(a?" — w?) as f(x) + 

g(x)x” where degrees of f and g are both less than n. The polynomial f(a) + g(x)a” 
is then equal to f(x) + wg(a) in Z,[x]/(x” — w}; and it is equal to f(x) — wg(x) in 
Z,q{x]/(x" +w). NTT computes the following ring isomorphism between Z, [2] /(x?" — 
w?) and Zale] / (x° —w) x Zale] /(£” +w) by substituting +w for x” in f(x tote yas 


f(x) + g(@)a" > (f(x) + wg(x), f(x) — wg(a))- (1) 


Multiplication in Z,[x]/(a?" — w?) can therefore be computed by respective multi- 


plications in Z,[x]/(x” + w) through the isomorphism. That is, a multiplication for 


x 
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polynomials of degrees less than 2n (in Z,[2]/(2?" — w?)) is replaced by two multipli- 
cations for polynomials of degrees less than n (in Z,[x]/(x” + w)). 

In KYBER, ring isomorphisms are applied repeatedly until linear polynomials are 
obtained. That is, KYBER NTT computes the isomorphism 


Rq = Zala] / (2° +1) = Zale] / (2? — Go) x +++ x Zq[z]/(a? = C127) (2) 


where ¢;’s are the principal 256-th roots of unity. A polynomial of a degree less than 256 
is hence mapped via KYBER NTT to 128 linear polynomials, each modulo a different 
r? -C j. In PQCLEAN [25], a reference C implementation and a hand-optimized Intel 
avx2 assembly implementation of KYBER NTT are provided. In addition to degree 
reduction, the two implementations utilize signed Montgomery reduction extensively 
for efficient multiplication over Z,. We verify whether the two NTT implementations 
compute the ring isomorphism correctly. 

To specify the correctness requirements of KYBER NTT, one could write down 
modular equations (1) according to its computation. Each equation would require 
explicit substitution. Thanks to modular equations with multiple moduli, a more 
intuitive and mathematical specification based on (2) is also expressible. Let F = 
0225, fex" denote the input polynomial in Rg = Z,[x] /(x?°S + 1) and the coefficients 
fk’s are input variables with —q < fk < q (0 < k < 256). Let Gj = gjo + g;,1x be the 
j-th final output linear polynomial from the implementations. The modular equations 


F = G; mod |q, x° — ¢;], for all 0 < j < 128 


specify the correctness of the KYBER NTT implementations. Observe that our specifi- 
cation is almost identical to (2). Modular equations with multiple moduli allow cryp- 
tographic programmers to express mathematical specification naturally. They greatly 
improve usability and reduce specification efforts in algebraic abstraction. 

COQCRYPTOLINE verifies the C reference implementation in about 18.6 min. The 
highly optimized avx2 implementation is verified in about 7.2 min. Observe that each 
layer of ring isomorphism requires 128 signed Montgomery reductions. KYBER NTT 
therefore has 7 x 128 = 896 Montgomery reductions similar to the running example in 
Fig. 4b. Algebraic abstraction successfully verifies the two KYBER NTT implementa- 
tions within 20 min. In comparison, CRYPTOLINE verifies both NTT implementations 
in 1 min without certification. 


7 Conclusion 


Verification through algebraic abstraction combines both algebraic and bit-accurate 
analyses. Non-linear computation is analyzed algebraically; soundness conditions are 
checked with bit-accurate SMT QF_BV solvers. We describe how to verify the tech- 
nique and certify its results. In the experiments, the hybrid technique successfully veri- 
fies non-linear integer computation found in cryptographic programs from elliptic curve 
and post-quantum cryptography with certification. We plan to explore more applications 
of algebraic abstraction in programs from post-quantum cryptography in near future. 
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Abstract. Multiparty session types (MSTs) are a type-based approach 
to verifying communication protocols. Central to MSTs is a projection 
operator: a partial function that maps protocols represented as global 
types to correct-by-construction implementations for each participant, 
represented as a communicating state machine. Existing projection oper- 
ators are syntactic in nature, and trade efficiency for completeness. We 
present the first projection operator that is sound, complete, and efficient. 
Our projection separates synthesis from checking implementability. For 
synthesis, we use a simple automata-theoretic construction; for checking 
implementability, we present succinct conditions that summarize insights 
into the property of implementability. We use these conditions to show 
that MST implementability is PSPACE-complete. This improves upon 
a previous decision procedure that is in EXPSPACE and applies to a 
smaller class of MSTs. We demonstrate the effectiveness of our approach 
using a prototype implementation, which handles global types not sup- 
ported by previous work without sacrificing performance. 


Keywords: Protocol verification - Multiparty session types - 
Communicating state machines - Protocol fidelity - Deadlock freedom 


1 Introduction 


Communication protocols are key components in many safety and operation crit- 
ical systems, making them prime targets for formal verification. Unfortunately, 
most verification problems for such protocols (e.g. deadlock freedom) are unde- 
cidable [11]. To make verification computationally tractable, several restrictions 
have been proposed [2,3,10,14,33,42]. In particular, multiparty session types 
(MSTs) [24] have garnered a lot of attention in recent years (see, e.g., the sur- 
vey by Ancona et al. [6]). In the MST setting, a protocol is specified as a global 
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type, which describes the desired interactions of all roles involved in the protocol. 
Local implementations describe behaviors for each individual role. The imple- 
mentability problem for a global type asks whether there exists a collection of 
local implementations whose composite behavior when viewed as a communicat- 
ing state machine (CSM) matches that of the global type and is deadlock-free. 
The synthesis problem is to compute such an implementation from an imple- 
mentable global type. 

MST-based approaches typically solve synthesis and implementability simul- 
taneously via an efficient syntactic projection operator [18,24,34,41]. Abstractly, 
a projection operator is a partial map from global types to collections of imple- 
mentations. A projection operator proj is sound when every global type G in its 
domain is implemented by proj(G), and complete when every implementable 
global type is in its domain. Existing practical projection operators for MSTs are 
all incomplete (or unsound). Recently, the implementability problem was shown 
to be decidable for a class of MSTs via a reduction to safe realizability of glob- 
ally cooperative high-level message sequence charts (HMSCs) [38]. In principle, 
this result yields a complete and sound projection operator for the considered 
class. However, this operator would not be practical. In particular, the proposed 
implementability check is in EXPSPACE. 


Contributions. In this paper, we present the first practical sound and complete 
projection operator for general MSTs. The synthesis problem for implementable 
global types is conceptually easy [38] — the challenge lies in determining whether 
a global type is implementable. We thus separate synthesis from checking imple- 
mentability. We first use a standard automata-theoretic construction to obtain a 
candidate implementation for a potentially non-implementable global type. How- 
ever, unlike [38], we then verify the correctness of this implementation directly 
using efficiently checkable conditions derived from the global type. When a global 
type is not implementable, our constructive completeness proof provides a coun- 
terexample trace. 

The resulting projection operator yields a PSPACE decision procedure 
for implementability. In fact, we show that the implementability problem is 
PSPACE-complete. These results both generalize and tighten the decidability 
and complexity results obtained in [38]. 

We evaluate a prototype of our projection algorithm on benchmarks taken 
from the literature. Our prototype benefits from both the efficiency of existing 
lightweight but incomplete syntactic projection operators [18,24,34,41], and the 
generality of heavyweight automata-based model checking techniques [28,36]: it 
handles protocols rejected by previous practical approaches while preserving the 
efficiency that makes MST-based techniques so attractive. 


2 Motivation and Overview 


Incompleteness of Existing Projection Operators. A key limitation of 
existing projection operators is that the implementation for each role is obtained 
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Fig. 1. Odd-even: An implementable but not (yet) projectable protocol and its local 
implementations 


via a linear traversal of the global type, and thus shares its structure. The follow- 
ing example, which is not projectable by any existing approach, demonstrates 
how enforcing structural similarity can lead to incompleteness. 


Example 2.1 (Odd-even). Consider the following global type Goe: 


p—>q:0.q>r:o0. pti. (p—>q:0.q>r:0.q>r:0.t1 + p>q:b.q>r:b.r—>p:o.0) 
poq:m. ut2. (p—>q:0.q>r:0.q>r:0.t2 + p—>q:b.q>r:b.r—>p:m.0) 


A term p —> q: m specifies the exchange of message m between sender p and 
receiver q. The term represents two local events observed separately due to 
asynchrony: a send event pœ q!m observed by role p, and a receive event q<p?m 
observed by role q. The + operator denotes choice, ut. G denotes recursion, and 
0 denotes protocol termination. 

Figure la visualizes Goe as an HMSC. The left and right sub-protocols respec- 
tively correspond to the top and bottom branches of the protocol. Role p chooses 
a branch by sending either o or m to q. On the left, q echoes this message to r. 
Both branches continue in the same way: p sends an arbitrary number of o mes- 
sages to q, each of which is forwarded twice from q to r. Role p signals the end 
of the loop by sending b to q, which q forwards to r. Finally, depending on the 
branch, r must send o or m to p. 

Figures 1b and 1c depict the structural similarity between the global type Goe 
and the implementations for p and q. For the “choicemaker” role p, the reason is 
evident. Role q’s implementation collapses the continuations of both branches in 
the protocol into a single sub-component. For r (Fig. 1d), the situation is more 
complicated. Role r does not decide on or learn directly which branch is taken, 
but can deduce it from the parity of the number of o messages received from q: 
odd means left and even means right. The resulting local implementation features 
transitions going back and forth between the two branches that do not exist in 
the global type. Syntactic projection operators fail to create such transitions. < 


One response to the brittleness of existing projection operators has been to give 
up on global type specifications altogether and instead revert to model checking 
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Fig. 2. High-level message sequence charts for the global types of Example 2.2. 


user-provided implementations [28,36]. We posit that what needs rethinking is 
not the concept of global types, but rather how projections are computed and 
how implementability is checked. 


Our Automata-Theoretic Approach. The synthesis step in our projection 
operator uses textbook automata-theoretic constructions. From a given global 
type, we derive a finite state machine, and use it to define a homomorphism 
automaton for each role. We then determinize this homomorphism automaton 
via subset construction to obtain a local candidate implementation for each role. 
If the global type is implementable, this construction always yields an implemen- 
tation. The implementations shown in Figs. 1b to 1d are the result of applying 
this construction to Goe from Example 2.1. Notice that the state labels in Fig. 1d 
correspond to sets of labels in the global protocol. 
Unfortunately, not all global types are implementable. 


Example 2.2. Consider the following four global types also depicted in Fig. 2: 


a= p—q:0.q—r:o.p—r:0.0 G.= p—q:o0.r—q:0.0 
om p>q:m.p—>r:0.q>r:0.0 A p—q:m.r—q:m.0 
Wat p—q:0.q—1r:0.r—p:o.p—r:0.0 @ax Pp—q:o.r—q:b.0 
á po qin. po Tr:0.r—qi0.q—ri0. 0 i p—q:m.r—q:b.0 


Similar to Goe, in all four examples, p chooses a branch by sending either o or 
m to q. The global type G, is not implementable because r cannot learn which 
branch was chosen by p. For any local implementation of r to be able to execute 
both branches, it must be able to receive o from p and q in any order. Because 
the two send events p> r!o and qp rio are independent of each other, they may 
be reordered. Consequently, any implementation of G, would have to permit 
executions that are consistent with global behaviors not described by G,, such 
as p>q:m:q—r:o- p—r:o. Contrast this with G/., which is implementable. In 
the top branch of G/,, role p can only send to r after it has received from r, which 
prevents the reordering of the send events pp r!o and qpr!o. The bottom branch 
is symmetric. Hence, r learns p’s choice based on which message it receives first. 

For the global type G,, role r again cannot learn the branch chosen by p. 
That is, r cannot know whether to send o or m to q, leading inevitably to dead- 
locking executions. In contrast, G‘, is again implementable because the expected 
behavior of r is independent of the choice by p. < 
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These examples show that the implementability question is non-trivial. To 
check implementability, we present conditions that precisely characterize when 
the subset construction for G yields an implementation. 


Overview. The rest of the paper is organized as follows. Section 3 contains rel- 
evant definitions for our work. Section 4 describes the synthesis step of our pro- 
jection. Section 5 presents the two conditions that characterize implementability 
of a given global type. In Sect. 6, we prove soundness of our projection via a 
stronger inductive invariant guaranteeing per-role agreement on a global run of 
the protocol. In Sect. 7, we prove completeness by showing that our two condi- 
tions hold if a global type is implementable. In Sect. 8, we discuss the complexity 
of our construction and condition checks. Section 9 presents our artifact and eval- 
uation, and Sect. 10 as well as Sect. 11 discuss related work. Additional details 
including omitted proofs can be found in the extended version of the paper [29]. 


3 Preliminaries 


Words. Let X be a finite alphabet. X* denotes the set of finite words over X, 
+ the set of infinite words, and X% their union X* U ©”. A word u € X* isa 
prefix of word v € X'*, denoted u < v, if there exists w € X® with u -w =v. 


Message Alphabet. Let P be a set of roles and V be a set of messages. We define 
the set of synchronous events S'sync = {p> q:m| p,q E P and m E€ V} where 
p—q:m denotes that message m is sent by p to q atomically. This is split for 
asynchronous events. For a role p € P, we define the alphabet Xp, = {p> q!m | 
qE P, me V} of send events and the alphabet X, = {psq?m|qe P, me V} 
of receive events. The event p> q!m denotes role p sending a message m to q, 
and p<q?m denotes role p receiving a message m from q. We write X, = 
Xp! U Lip,?s 2 = User Šp, and X? = User dip,? Finally, async = X U SL». We 
say that p is active in © E€ Nasyne if x € Xp. For each role p € P, we define a 
homomorphism 4} Sy? where x) =o if x € Xp and e otherwise. We write V(w) 
to project the send and receive events in w onto their messages. We fix P and V 
in the rest of the paper. 


Global Types — Syntax. Global types for MSTs [31] are defined by the grammar: 


G:=0 | Sop qii m;.Gi | wt.G |t 
ier 


where p, q; range over P, m; over V, and t over a set of recursion variables. 

We require each branch of a choice to be distinct: Vi, j € I.i Æ j > (qi, mi) A 
(qj, Mmj), the sender and receiver of an atomic action to be distinct: Vi € I. p Æ qi, 
and recursion to be guarded: in yt. G, there is at least one message between ut 
and each t in G. When |I| = 1, we omit 5°. For readability, we sometimes use 
the infix operator + for choice, instead of X`. When working with a protocol 
described by a global type, we write G to refer to the top-level type, and we 
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use G to refer to its subterms. For the size of a global type, we disregard multiple 
occurrences of the same subterm. 

We use the extended definition of global types from [31] that allows a sender 
to send messages to different roles in a choice. We call this sender-driven choice, 
as in [88], while it was called generalized choice in [31]. This definition subsumes 
classical MSTs that only allow directed choice [24]. The types we use focus on 
communication primitives and omit features like delegation or parametrization. 
We defer a detailed discussion of different MST frameworks to Sect. 11. 


Global Types — Semantics. As a basis for the semantics of a global type G, we 
construct a finite state machine GAut(G) = (Qe, “sync, ÔG, 90,4; Fa) where 


— Qa is the set of all syntactic subterms in G together with the term 0, 

— ôg is the smallest set containing (J jez P> qi:mi-Gi, p— qi : Mi, Gi) for each 
i € I, as well as (ut.G’,£, G") and (t, £, wt.G’) for each subterm pt.G’, 

= 4,4 = G and Fe = {0}. 


We define a homomorphism split onto the asynchronous alphabet: 


split(p—q:m):=pbpq!m.q<ip?m . 


The semantics L(G) of a global type G is given by C~(split(£(GAut(G)))) 
where C™ is the closure under the indistinguishability relation ~ [31]. Two events 
are independent if they are not related by the happened-before relation [26]. 
For instance, any two send events from distinct senders are independent. Two 
words are indistinguishable if one can be reordered into the other by repeatedly 
swapping consecutive independent events. The full definition is in the extended 
version [29]. 


Communicating State Machine [11]. A = {Ap} pep is a CSM over P and Y if A, 
is a finite state machine over X, for every p € P, denoted by (Qp, Xp, dp, Go,p, Fp). 
Let J [pep Sp denote the set of global states and Chan = {(p,q) | p,q E P, p # q} 
denote the set of channels. A configuration of A is a pair (5,€), where § is a 
global state and € : Chan — Y* is a mapping from each channel to a sequence of 
messages. We use 5p to denote the state of p in s. The CSM transition relation, 
denoted —, is defined as follows. 


P 


g) aim, Pam, (3”,€’) if (&p, p> qlm, 5%) € dp, 5% = 5! for every role r Æ p, 
p,q) = €(p,q) : m and € (c) = ¿(c) for every other channel c € Chan. 


'( 
5, £) saem, (8",€’) if (54,q p?m, 54) E fq, 5r = 5%; for every role r # q, 
Elp, a) = m - E' (p,q) and ¿' (c) = ¿(c i for every other channel c € Chan. 


- (5; 
£ 
a 


In the initial configuration (50, o), each role’s state in 5ọ is the initial state qo, 
of A,, and o maps each channel to £. A configuration (8, £) is said to be final iff 
Sp is final for every p and € maps each channel to e. Runs and traces are defined 
in the expected way. A run is maximal if either it is finite and ends in a final 
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configuration, or it is infinite. The language L(A) of the CSM A is defined as the 
set of maximal traces. A configuration (5, £) is a deadlock if it is not final and has 


no outgoing transitions. A CSM is deadlock-free if no reachable configuration is 
a deadlock. 


Finally, implementability is formalized as follows. 


Definition 3.1 (Implementability [31]). A global type G is implementable 
if there exists a CSM {Ap} pep such that the following two properties hold: 

(i) protocol fidelity: L({Apfpep) = L(G), and (ii) deadlock freedom: 
{Ap }pep is deadlock-free. We say that {Ap} pep implements G. 


4 Synthesizing Implementations 


The construction is carried out in two steps. First, for each role p € P, we define 
an intermediate state machine GAut(G)|, that is a homomorphism of GAut(G). 
We call GAut(G)|, the projection by erasure for p, defined below. 


Definition 4.1 (Projection by Erasure). Let G be some global type with 
its state machine GAut(G) = (Qe, L'sync; ôG, 90,4; Fa). For each role p € P, 
we define the state machine GAut(G)|,>= (Qe, Xp © {£}, ô, 40,4, Fa) where 


split(a)\ ä 
ô := {q r =, q |q => q' € ða}. By definition of split(-), it holds that 
split(a)} s, E Lpw {e}. 


Then, we determinize GAut(G)|, via a standard subset construction to obtain 
a deterministic local state machine for p. 


Definition 4.2 (Subset Construction). Let G be a global type and p be a 
role. Then, the subset construction for p is defined as 


@(G,p) = (Qp, Xp, Ôp, 50,p, Fp) where 


- 6(s,a) := {q € Qa | Iq € s, S5* q' € ô|}, for every s C Qa anda € Xp 
~ sop = {4 E Qa | 99.6 "qe by}, 
- Qp = Pin} ÀQ- QU {5(s,a)|s€QAae Xp} \ {0} , and 


~ bp = 4|Q,x5, 
- Fp := {s E Qp | sN Fa #0} 


Note that the construction ensures that Qp only contains subsets of Qg whose 
states are reachable via the same traces, i.e. we typically have |Q,| < 2lQal, 

The following characterization is immediate from the subset construction; 
the proof can be found in the extended version [29]. 


Lemma 4.3. Let G be a global type, r be a role, and @(G,r) be its subset 
construction. If w is a trace of GAut(G), split(w)} x, is a trace of @(G,r). Ifu 
is a trace of @(G,r), there is a trace w of GAut(G) such that split(w)} s, = u. 
It holds that L(G) s, = L(€ (G, r)). 
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Using this lemma, we show that the CSM {6 (G, p) }pep preserves all behav- 
iors of G. 


Lemma 4.4. For all global types G, L(G) C L({E(G, p) fpep)- 


We briefly sketch the proof here. Given that {@(G, p) }pep is deterministic, to 
prove language inclusion it suffices to prove the inclusion of the respective prefix 
sets: 


pref(£(G)) C pref(L{@(G, p) }yep) 


Let w be a word in L(G). If w is finite, membership in £({@(G, p) } pep) is imme- 
diate from the claim above. If w is infinite, we show that w has an infinite run in 
{€(G, p)}pep using Kénig’s Lemma. We construct an infinite graph G,,(V, E) 
with V := {v, | trace(p) < wh and E := {(vp,,Up,) | I £ E Lasyne. trace(p2) = 
trace(p;)- £}. Because {@(G, p)} ep is deterministic, G,, is a tree rooted at 
uz, the vertex corresponding to the empty run. By K6nig’s Lemma, every infi- 
nite tree contains either a vertex of infinite degree or an infinite path. Because 
{€(G, p) }pep consists of a finite number of communicating state machines, the 
last configuration of any run has a finite number of next configurations, and Gw is 
finitely branching. Therefore, there must exist an infinite path in Gu representing 
an infinite run for w, and thus w E€ L({@(G, p) rep). 

The proof of the inclusion of prefix sets proceeds by structural induction and 
primarily relies on Lemma 4.3 and the fact that all prefixes in L(G) respect the 
order of send before receive events. 


5 Checking Implementability 


We now turn our attention to checking implementability of a CSM produced 
by the subset construction. We revisit the global types from Example 2.2 (also 
shown in Fig. 2), which demonstrate that the naive subset construction does 
not always yield a sound implementation. From these examples, we distill our 
conditions that precisely identify the implementable global types. 

In general, a global type G is not implementable when the agreement on 
a global run of GAut(G) among all participating roles cannot be conveyed via 
sending and receiving messages alone. When this happens, roles can take locally 
permitted transitions that commit to incompatible global runs, resulting in a 
trace that is not specified by G. Consequently, our conditions need to ensure 
that when a role p takes a transition in @(G, p), it only commits to global runs 
that are consistent with the local views of all other roles. We discuss the relevant 
conditions imposed on send and receive transitions separately. 


Send Validity. Consider G, from Example 2.2. The CSM 46 (Gs, p)}pep has 
an execution with the trace ppq!o-q<p?o-rpq!m. This trace is possible because the 
initial state of @(G,,r), Sor, contains two states of GAut(G,)|,, each of which 
has a single outgoing send transition labeled with rp>q!o and rpq!m respectively. 
Both of these transitions are always enabled in so, meaning that r can send 
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xr > q!m even when p has chosen the top branch and q expects to receive o 
instead of m from r. This results in a deadlock. In contrast, while the state 
Sor in @(G4,r) likewise contains two states of GAut(G4)|,, each with a single 
outgoing send transition, now both transitions are labeled with r > q!b. These 
two transitions collapse to a single one in @(G‘,,r). This transition is consistent 
with both possible local views that p and q might hold on the global run. 

Intuitively, to prevent the emergence of inconsistent local views from send 
transitions of @(G,p), we must enforce that for every state s € Qp with an 
outgoing send transition labeled x, a transition labeled x must be enabled in all 
states of GAut(G)|, represented by s. We use the following auxiliary definition 
to formalize this intuition subsequently. 


Definition 5.1 (Transition Origin and Destination). Let s > s’ € 6, be 
a transition in @(G,p) and 6, be the transition relation of GAut(G)|,. We 
define the set of transition origins tr-orig(s & s’) and transition destinations 
tr-dest(s + s’) as follows: 


x 


tr-orig(s > s’) := {G € s | 3G' € s'.G &* G € ô} and 
tr-dest(s > s’) := {G' € s' | IG e€ s. G 5*G'eEs}. 


Our condition on send transitions is then stated below. 


Definition 5.2 (Send Validity). @(G,p) satisfies Send Validity iff every 


send transition s > s' € dp is enabled in all states contained in s: 


Vs s €6,.2€ Ly) => trorig(s Ss')=s . 


Receive Validity. To motivate our condition on receive transitions, let us revisit 
G, from Example 2.2. The CSM {@(G,, p)}}pep recognizes the following trace 
not in the global type language L(G,.): 


pepqlo-q<p?ro-qprio-pprio-r<plo-rdq?o. 


The issue lies with r which cannot distinguish between the two branches in G,. 
The initial state so, of @(G,,r) has two states of GAut(G,.) corresponding to 
the subterms G; := q—-r:o0.p—r:o.0 and Gy := p—r:o.q—r:0.0. Here, 
G, and G, are the top and bottom branch of G, respectively. This means that 
there are outgoing transitions in so, labeled with r<p?o and r<q?o. If r takes 
the transition labeled r < p?o, it commits to the bottom branch Gy. However, 
observe that the message o from p can also be available at this time point if the 
other roles follow the top branch G;. This is because p can send o to r without 
waiting for r to first receive from q. In this scenario, the roles disagree on which 
global run of GAut(G,.) to follow, resulting in the violating trace above. 
Contrast this with G/.. Here, so again has outgoing transitions labeled with 
r<p?o and r<q?o. However, if r takes the transition labeled r<dp?o, committing 
to the bottom branch, no disagreement occurs. This is because if the other roles 
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are following the top branch, then p is blocked from sending to r until after it 
has received confirmation that r has received its first message from q. 

For a receive transition s & sı in @(G,p) to be safe, we must enforce that 
the receive event x cannot also be available due to reordered sent messages 
in the continuation Gg € s2 of another anth iig receive transition s & S2. 
To formalize this condition, we use the set MĚ, of available messages for a 
syntactic subterm G of G and a set of blocked roles B. This notion was already 
defined in [31, Sec. 2.2]. Intuitively, Mee G...) consists of all send events q > rim 
that can occur on the traces of G such that m will be the first message added 
to channel (q, r) before any of the roles in B takes a step. 


Available Messages. The set of available messages is recursively defined on the 
structure of the global type. To obtain all possible messages, we need to unfold 
the distinct recursion variables once. For this, we define a map getu from variable 
to subterms and write getug for getu(G): 
getu(0) := [] getu(t) := [] getu(ut.G) := [t = G] U getu(G) 
getu» icr P> di: mi.Gi) := User get u(Gi) 
The function M® ae i keeps a set of unfolded variables T, which is empty initially. 


BT 28 B,T 4 7B, TU{t} BT Ø ifteT 
Moo. > aan Mita. J = Me. e) Mg. ye fus TU{t} iftgT 
(getug (t)..-) 
B,T = Uiermey Me \ {qi ap?M}) U {qi ap?m:} ifpgB 
(Mier Poa; -Gi---) Uer Ma art ifpeB 
We write Me. ) for Me If B is a singleton set, we omit set notation and 


write MP. for Mia P} y The set of available messages captures the possible 
states of a channels ne a given receive transition is taken. 
Definition 5.3 (Receive Validity). @(G,p) satisfies Receive Validity iff no 


receive transition is enabled in an alternative continuation that originates from 
the same source state: 


paq? Ma paq2? M2 
Vs ———> 81, s —— 82 € bp. 


2? 
qı q2 => V Go € tr-dest(s em, s2). qı > p!mı ¢ Mez.) ‘ 


Subset Projection. We are now ready to define our projection operator. 


Definition 5.4 (Subset Projection of G). The subset projection A(G, p) 
of G onto p is @(G, p) if it satisfies Send Validity and Receive Validity. We lift 
this operation to a partial function from global types to CSMs in the expected way. 


We conclude our discussion with an observation about the syntactic structure 
of the subset projection: Send Validity implies that no state has both outgoing 
send and receive transitions (also known as mixed choice). 


Corollary 5.5 (No Mixed Choice). If Y(G,p) satisfies Send Validity, then 
for all s + 81,8 > s3 € bp, 1 € Xi iff £2 € X. 
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6 Soundness 


In this section, we prove the soundness of our subset projection, stated as follows. 


Theorem 6.1. Let G be a global type and {A(G,p)}pep be the subset projec- 
tion. Then, {P(G,p)}rep implements G. 


Recall that implementability is defined as protocol fidelity and deadlock free- 
dom. Protocol fidelity consists of two language inclusions. The first inclusion, 
L(G) C LIL A(G,p)}rep), enforces that the subset projection generates at 
least all behaviors of the global type. We showed in Lemma 4.4 that this holds 
for the subset construction alone (without Send and Receive Validity). 

The second inclusion, L({A(G,p)}pep) C L(G), enforces that no new 
behaviors are introduced. The proof of this direction relies on a stronger induc- 
tive invariant that we show for all traces of the subset projection. As discussed 
in Sect. 5, violations of implementability occur when roles commit to global runs 
that are inconsistent with the local views of other roles. Our inductive invariant 
states the exact opposite: that all local views are consistent with one another. 
First, we formalize the local view of a role. 


Definition 6.2 (Possible run sets). Let G be a global type and GAut(G) be 
the corresponding state machine. Let p be a role and w E Vigne be a word. We 


define the set of possible runs RF (w) as all maximal runs of GAut(G) that are 
consistent with p’s local view of w: 


R (w) := {p is a maximal run of GAut(G) | wy, < split(trace(p))} s} - 


While Definition 6.2 captures the set of maximal runs that are consistent 
with the local view of a single role, we would like to refer to the set of runs that 
is consistent with the local view of all roles. We formalize this as the intersection 
of the possible run sets for all roles, which we denote as 


I(w) := N RE (w) . 


pEP 
With these definitions in hand, we can now formulate our inductive invariant: 


Lemma 6.3. Let G be a global type and 4P (G, p) pep be the subset projection. 
Let w be a trace of {P(G,p)}pep. It holds that I(w) is non-empty. 


The reasoning for the sufficiency of Lemma 6.3 is included in the proof of 
Theorem 6.1, found in the extended version [29]. In the rest of this section, 
we focus our efforts on how to show this inductive invariant, namely that the 
intersection of all roles’ possible run sets is non-empty. 

We begin with the observation that the empty trace € is consistent with all 
runs. As a result, I(€) = pep Ry (e) contains all maximal runs in GAut(G). By 
definition, state machines for global types include at least one run, and the base 
case is trivially discharged. Intuitively, (w) shrinks as more events are appended 
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x = po q!m, w E€ Xžsyne y=q<p?m, w = wru with u € Xžsync 


Fig. 3. Evolution of RS (-) sets when p sends a message m and q receives it. 


to w, but we show that at no point does it shrink to Ø. We consider the cases 
where a send or receive event is appended to the trace separately, and show that 
the intersection set shrinks in a principled way that preserves non-emptiness. In 
fact, when a trace is extended with a receive event, Receive Validity guarantees 
that the intersection set does not shrink at all. 


Lemma 6.4. Let G be a global type and £ P (G, p) pep be the subset projection. 
Let wx be a trace of {P(G,p)}pep such that x € X2. Then, I(w) = I(wa). 


To prove this equality, we further refine our characterization of intersection 
sets. In particular, we show that in the receive case, the intersection between the 
sender and receiver’s possible run sets stays the same, i.e. 


Re (w) N RẸ (w) = R (wa) N RE (wa) i 


Note that it is not the case that the receiver only follows a subset of the sender’s 
possible runs. In other words, RF (w) © RF (w) is not inductive. The equality 
above simply states that a receive action can only eliminate runs that have 
already been eliminated by its sender. Figure 3 depicts this relation. 

Given that the intersection set strictly shrinks, the burden of eliminating 
runs must then fall upon send events. We show that send transitions shrink the 
possible run set of the sender in a way that is prefiz-preserving. To make this 
more precise, we introduce the following definition on runs. 


Definition 6.5 (Unique splitting of a possible run). Let G be a global type, 
p a role, and w E€ X% a word. Let p be a possible run in RẸ (w). We define 


async 


the longest prefix of p matching w: 


al = max{p! |p! <p A sptit(tracelp'))}y, < whs,} « 


Ifa’! # p, we can split p into p=a-G nay a B where ao! =a. G, G’ denotes 
the state following G, and B denotes the suffix of p following a-G-G"’. We call 
Gea. b the unique splitting of p for p matching w. We omit the role p 
when obvious from context. This splitting is always unique because the maximal 
prefix of any p € RS (w) matching w is unique. 
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When role p fires a send transition p> q!m, any run p= a.: G Læ. b in 
p’s possible run with split(/)} s, # p> q!m is eliminated. While the resulting 
possible run set could no longer contain runs that end with G’- 3, Send Validity 
guarantees that it must contain runs that begin with a- G. This is formalized 
by the following lemma. 


Lemma 6.6. Let G be a global type and {P (G, p) yep be the subset projection. 
Let wx be a trace of {A(G,p)}pep such that x € X, N Xp for some p E€ P. Let 
p be a run in I(w), and a- G Lg: GB be the unique splitting of p for p with 
respect to w. Then, there exists a run p' in I(wx) such thata- G < p. 


This concludes our discussion of the send and receive cases in the inductive 
step to show the non-emptiness of the intersection of all roles’ possible run 
sets. The full proofs and additional definitions can be found in the extended 
version [29]. 


7 Completeness 


In this section, we prove completeness of our approach. While soundness states 
that if a global type’s subset projection is defined, it then implements the global 
type, completeness considers the reverse direction. 


Theorem 7.1 (Completeness). If G is implementable, then {A(G,p)}pep 
is defined. 


We sketch the proof and refer to the extended version [29] for the full proof. 

From the assumption that G is implementable, we know there exists a witness 
CSM that implements G. While the soundness proof picks our subset projection 
as the existential witness for showing implementability — thereby allowing us 
to reason directly about a particular implementation — completeness only guar- 
antees the existence of some witness CSM. We cannot assume without loss of 
generality that this witness CSM is our subset construction; however, we must 
use the fact that it implements G to show that Send and Receive Validity hold 
on our subset construction. 

We proceed via proof by contradiction: we assume the negation of Send and 
Receive Validity for the subset construction, and show a contradiction to the 
fact that this witness CSM implements G. In particular, we contradict protocol 
fidelity (Definition 3.1(i)), stating that the witness CSM generates precisely the 
language L(G). To do so, we exploit a simulation argument: we first show that 
the negation of Send and Receive Validity forces the subset construction to 
recognize a trace that is not a prefix of any word in £(G). Then, we show that 
this trace must also be recognized by the witness CSM, under the assumption 
that the witness CSM implements G. 

To highlight the constructive nature of our proof, we convert our proof obli- 
gation to a witness construction obligation. To contradict protocol fidelity, it 
suffices to construct a witness trace vo satisfying two properties, where {By }yep 
is our witness CSM: 
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(a) vo is a trace of {Bp }pep, and 
(b) the run intersection set of vo is empty: I(vo) = peP Ry (vo) =f. 


We first establish the sufficiency of conditions (a) and (b). Because {Bp} pep 
is deadlock-free by assumption, every prefix extends to a maximal trace. Thus, 
to prove the inequality of the two languages L({Bp}pep) and L(G), it suffices 
to prove the inequality of their respective prefix sets. In turn, it suffices to show 
the existence of a prefix of a word in one language that is not a prefix of any 
word in the other. We choose to construct a prefix in the CSM language that is 
not a prefix in £(G). We again leverage the definition of intersection sets (Defi- 
nition 6.2) to weaken the property of language non-membership to the property 
of having an empty intersection set as follows. By the semantics of £(G), for 
any w € L(G), there exists w’ € split(£(GAut(G))) with w ~ w’. For any 
w € split(£(GAut(G))), it trivially holds that w’ has a non-empty intersection 
set. Because intersection sets are invariant under the indistinguishability rela- 
tion ~, w must also have a non-empty intersection set. Since intersection sets 
are monotonically decreasing, if the intersection set of w is non-empty, then for 
any v < w, the intersection set of v is also non-empty. Modus tollens of the chain 
of reasoning above tells us that in order to show a word is not a prefix in L(G), 
it suffices to show that its intersection set is empty. 

Having established the sufficiency of properties (a) and (b) for our witness 
construction, we present the steps to construct vo from the negation of Send and 
Receive Validity respectively. We start by constructing a trace in {@(G, p)p}per 
that satisfies (b), and then show that {Bp} pep also recognizes the trace, thereby 
satisfying (a). In both cases, let p be the role and s be the state for which the 
respective validity condition is violated. 
Send Validity (Definition 5.2). Let s ——> 


pam, s’ € dp be a transition such that 


trorig(s 22) s) £s. 


First, we find a trace u of {@(G, p)p}pep that satisfies: (1) role p is in state s 
in the CSM configuration reached via u, and (2) the run of GAut(G) on u 


visits a state in s \ tr-orig(s ae, s’). We obtain such a witness u from 


the split(trace(—)) of a run prefix of GAut(G) that ends in some state in 


s \ tr-orig(s pam s'). Any prefix thus obtained satisfies (1) by definition of 
@(G,p), and satisfies (2) by construction. Due to the fact that send transitions 
are always enabled in a CSM, u-ppq!m must also be a trace of {@(G, p) }oep, 
thus satisfying property (a) by a simulation argument. We then argue that 
u-ppq!m satisfies property (b), stating that I(u-pp>q!m) is empty: the negation 
of Send Validity gives that there exist no run extensions from our candidate 
state in s \ tr-orig(s bse ul s’) with the immediate next action p > q : m, and 
therefore there exists no maximal run in GAut(G) consistent with u - p> q!m. 
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? mo 
Receive Validity (Definition 5.3). Let s eee and es 


j 
PLT, sy) such that 


S2 E Op 


be two transitions, and let G € tr-dest(s 


qı # q2 and qı Pp!m, € Migs...) : 


Constructing the witness vp pivots on finding a trace u of {@(G, p) }pep such 
that both u-p<qi?m and u-p<iq2?mz are traces of {@(G, p) pep. Equivalently, 
we show there exists a reachable configuration of {@(G, p)Ẹ}pep in which p can 
receive either message from distinct senders qı and q2. Formally, the local state 
of p has two outgoing states labeled with p<qi?m, and p< qe2?me, and the 
channels qi, p and q2, p have mı and mz at their respective heads. We construct 
such a u by considering a run in GAut(G) that contains two transitions labeled 
with qı —> p : mı and q2 —> p : mg. Such a run must exist due to the negation of 
Receive Validity. We start with the split trace of this run, and argue that, from 
the definition of M(-) and the indistinguishability relation ~, we can perform 
iterative reorderings using ~ to bubble the send action qı > p!m, to the position 
before the receive action paq2? m2. Then, (a) for u-p<qi?m, holds by a simulation 
argument. We then separately show that (b) holds for p <q ,?m, using similar 
reasoning as the send case to complete the proof that u-p<q,?my, suffices as a 
witness for vo. 

It is worth noting that the construction of the witness prefix vo in the 
proof immediately yields an algorithm for computing counterexample traces 
to implementability. 


Remark 7.2 (Mixed Choice is Not Needed to Implement Global Types). Theorem 
7.1 basically shows the necessity of Send Validity for implementability. Corollary 
5.5 shows that Send Validity precludes states with both send and receive out- 
going transitions. Together, this implies that an implementable global type can 
always be implemented without mixed choice. Note that the syntactic restric- 
tions on global types do not inherently prevent mixed choice states from aris- 
ing in a role’s subset construction, as evidenced by r in the following type: 
p-q:l.q—-r:m.0 + p-q:r.r—q:m.0. Our completeness result thus implies 
that this type is not implementable. Most MST frameworks [18, 24,31] implicitly 
force no mized choice through syntactic restrictions on local types. We are the 
first to prove that mixed choice states are indeed not necessary for completeness. 
This is interesting because mixed choice is known to be crucial for the expressive 
power of the synchronous z-calculus compared to its asynchronous variant [32]. 


8 Complexity 


In this section, we establish PSPACE-completeness of checking implementability 
for global types. 


Theorem 8.1. The MST implementability problem is PSPACE-complete. 
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Proof. We first establish the upper bound. The decision procedure enumerates 
for each role p the subsets of GAut(G)|,. This can be done in polynomial space 
and exponential time. For each p and s C Qa, it then (i) checks membership of s 
in Qp of € (G, p), and (ii) if s € Qp, checks whether all outgoing transitions of s 
in @(G,p) satisfy Send and Receive Validity. Check (i) can be reduced to the 
intersection non-emptiness problem for nondeterministic finite state machines, 
which is in PSPACE [44]. It is easy to see that check (ii) can be done in poly- 
nomial time. In particular, the computation of available messages for Receive 
Validity only requires a single unfolding of every loop in G. 

Note that the synthesis problem has the same complexity. The subset con- 
struction to determinize GAut(G)|, can be done using a PSPACE transducer. 
While the output can be of exponential size, it is written on an extra tape that is 
not counted towards memory usage. However, this means we need to perform the 
validity checks as described above instead of using the computed deterministic 
state machines. 

Second, we prove the lower bound. The proof is inspired by the proof for The- 
orem 4 [4] in which Alur et al. prove that checking safe realizability of bounded 
HMSCs is PSPACE-hard. We reduce the PSPACE-complete problem of check- 
ing universality of an NFA M = (Q,A,0,q0, F) to checking implementability. 
Without loss of generality, we assume that every state can reach a final state. We 
construct a global type G for p,q and r that is implementable iff L(M) = £. 
For this, we define subterms G and G, as well as G4 for every q E€ Q and G,. 
We use a fresh letter L to handle final states of M. We also define p q:m as 
an abbreviation for p—>q:m.q—p:m. 


GeHG4c, 


Gi :=peqgl.por:go.Gga 


G = JU ages Eaa. Ga’) ifq¢F 
q roq:L.0 + dragiesg(tae-Gy) ifger 


Gr :=peqir.per:go. Gy 


G,:=reg:l.0+ So (req:a. Gy) 
aca 


The global type G is constructed such that p first decides whether words from 
L£(M) or from 4 are sent subsequently. This decision is known to p and q but not 
to r. The protocol then continues with r sending letters from A to q, and p is not 
involved. Intuitively, q is able to receive these letters if and only if £(M) = a. 
From Theorems 6.1 and 7.1, we know that 46 (G, p)p} pep implements G if G 
is implementable. 

We claim that {@(G, p)p}pep implements G if and only if £(M) = £. 
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First, assume that L(M) # A*. Then, there exists w ¢ £(M). We can con- 
struct the following run of {@(G,p)p}tpep that deadlocks. Role p chooses the 
left subterm G; and, subsequently, r sends w to q. We do a case analysis on 
whether w contains a prefix w’ such that w’ ¢ pref(£(M)). If so, sending the 
last letter of a minimal prefix leads to a deadlock in {@(G, p)p}pep, contra- 
dicting deadlock freedom. If not, it holds that w is a prefix of a word in £(M). 
Still, role r can send L, which cannot be received, also contradicting deadlock 
freedom. 

Second, assume that £(M) = A‘. With this, it is fine that r does not know 
the branch. Role q will be able to receive all messages since @(G, q) can receive, 
letter by letter, w.L for every w € L(M) from r. Thus, protocol fidelity and 
deadlock freedom hold, concluding the proof. 

Note that PSPACE-hardness only holds if the size of G does not account 
for common subterms multiple times. Because every message is immediately 
acknowledged, the constructed global type specifies a universally 1-bounded [23] 
language, proving that PSPACE-hardness persists for such a restriction. For our 
construction, it does not hold that V(L(Gi)4-5, ,) = £(M). We chose so to have 
a more compact protocol. However, we can easily fix this by sending the decision 
of r first to p, allowing to omit the messages L to q. 


This result and the fact that local languages are preserved by the subset 
projection (Lemma 4.3) leads to the following observation. 


Corollary 8.2. Let G be an implementable global type. Then, the subset projec- 
tion { P(G, p) pep is a local language preserving implementation for G, i.e., 
L(A(G,p)) = L(G)}s, for every p, and can be computed in PSPACE. 


Remark 8.3 (MST implementability with directed choice is PSPACE-hard). The- 
orem 8.1 is stated for global types with sender-driven choice but the provided 
type is in fact directed. Thus, the PSPACE lower bound also holds for imple- 
mentability of types with directed choice. 


9 Evaluation 


We consider the following three aspects in the evaluation of our approach: 
(E1) difficulty of implementation (E2) completeness, and (E3) comparison to 
state of the art. 

For this, we implemented our subset projection in a prototype tool [1,37]. It 
takes a global type as input and computes the subset projection for each role. 
It was straightforward to implement the core functionality in approximately 700 
lines of Python3 code closely following the formalization (E1). 

We consider global types (and communication protocols) from seven different 
sources as well as all examples from this work (cf. 1st column of Table 1). Our 
experiments were run on a computer with an Intel Core i7-1165G7 CPU and used 
at most 100MB of memory. The results are summarized in Table 1. The reported 
size is the number of states and transitions of the respective state machine, which 
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Table 1. Projecting Global Types. For every protocol, we report whether it is imple- 
mentable v or not x, the time to compute our subset projection and the generalized 
projection by Majumdar et al. [31] as well as the outcome as V for “implementable”, x 
for “not implementable” and (x) for “not known”. We also give the size of the protocol 
(number of states and transitions), the number of roles, the combined size of all subset 
projections (number of states and transitions). 


Source Name Impl. Subset Proj. Size |P| Size [31] 

(complete) Proj’s (incomplete) 

Instrument Contr. Prot. A v v 0.4ms 22 3 61 v 0.2ms 

35] Instrument Contr. Prot. B v v 0.3ms 17 3 47 v 0.1 ms 
OAuth2 v v 0.1 ms 0 3 23 v  <0.1ms 

34] Multi Party Game v v 0.5ms 2 3 67 v 0.1 ms 
24] Streaming v v 0.2ms 13 4 28 v  <0.1ms 
13] Non-Compatible Merge v v 0.2 ms 3 25 v 0.1 ms 
45]  Spring-Hibernate v v 1.0ms 62 6 118 v 0.7ms 
Group Present v v 0.6ms 5 4 85 v 0.6 ms 

31] Late Learning v v 0.3 ms 7 4 34 v 0.2 ms 
Load Balancer (n = 10) v v 3.9ms 36 12 106 v 2.4ms 
Logging (n = 10) v v 71.5ms 8 13 322 v 10.0 ms 

2 Buyer Protocol v v 0.5ms 22 3 60 v 0.2ms 

[38] 2B-Prot. Omit No v v 04ms 19 3 56 (x) 0.1 ms 
2B-Prot. Subscription v v 0.7ms 46 3 95 (x) 0.3 ms 
2B-Prot. Inner Recursion v v 04ms 17 3 51 v 0.1 ms 
Odd-even (Example 2.1) v v 0.5ms 32 3 70 (x) 0.2 ms 

Gr — Receive Val. Violated (§2) x x O.lms 12 3 - (x) <0.1ms 

Gi, — Receive Val. Satisfied (§2) v v 0.2ms 16 3 35 v 0.1 ms 

New Gs ~ Send Val. Violated (§2) x x <0lms 8 3 - (x) <0.1ms 
Gi, — Send Val. Satisfied (§2) v yv <O1lms 7 3 17 v  <0.1ms 

Gyoia (810) v v 0.4ms 21 3 50 (x) 0.1 ms 

Gaunt (§10) v v 04ms 30 3 61 v 0.2 ms 


allows not to account for multiple occurrences of the same subterm. As expected, 
our tool can project every implementable protocol we have considered (E2). 

Regarding the comparison against the state of the art (E3), we directly com- 
pared our subset projection to the incomplete approach by Majumdar et al. [31], 
and found that the run times are in the same order of magnitude in general (typ- 
ically a few milliseconds). However, the projection of [31] fails to project four 
implementable protocols (including Example 2.1). We discuss some of the other 
examples in more detail in the next section. We further note that most of the 
run times reported by Scalas and Yoshida [36] on their model checking based 
tool are around 1s and are thus two to three orders of magnitude slower. 
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10 Discussion 


Success of Syntactic Projections Depends on Representation. Let us 
illustrate how unfolding recursion helps syntactic projection operators to suc- 
ceed. Consider this implementable global type, which is not syntactically pro- 
jectable: 


p—q:o. uti. (p—>q:0.q—>r:0.tı1 + poq:b.q—r:b.0) 
Gpola += + 
poq:m.qor:m. pte. (p—q:0.q—>r:0.t2 + poq:b.q—r:b.0) 


Similar to projection by erasure, a syntactic projection erases events that a role 
is not involved in and immediately tries to merge different branches. The merge 
operator is a partial operator that checks sufficient conditions for implementabil- 
ity. Here, the merge operator fails for r because it cannot merge a recursion 
variable binder and a message reception. Unfolding the global type preserves the 
represented protocol and resolves this issue: 


p—q:b.q—r:b.0 
p—qio. 
Gant (= + p—>q:0.q>r:0. pt. (p—>q:0.q—>r:0.t1 + pq:b.q—r:b.0) 
pogq:m.qor:m. pte.(poqg:o.q—r:o.te + poq:b.q—r:b.0) 


(We refer to [29] for visual representations of both global types.) This global type 
can be projected with most syntactic projection operators and shows that the 
representation of the global type matters for syntactic projectability. However, 
such unfolding tricks do not always work, e.g. for the odd-even protocol (Exam- 
ple 2.1). We avoid this brittleness using automata and separating the synthesis 
from checking implementability. 


Entailed Properties from the Literature. We defined implementability for 
a global type as the question of whether there exists a deadlock-free CSM that 
generates the same language as the global type. Various other properties of 
implementations and protocols have been proposed in the literature. Here, we 
give a brief overview and defer to the extended version [29] for a detailed analysis. 
Progress |18], a common property, requires that every sent message is eventu- 
ally received and every expected message will eventually be sent. With deadlock 
freedom, our subset projection trivially satisfies progress for finite traces. For infi- 
nite traces, as expected, fairness assumptions are required to enforce progress. 
Similarly, our subset projection prevents unspecified receptions [14] and orphan 
messages [9,21], respectively interpreted in our multiparty setting with sender- 
driven choice. We also ensure that every local transition of each role is ese- 
cutable [14], i.e. it is taken in some run of the CSM. Any implementation of a 
global type has the stable property [28], i.e., one can always reach a configuration 
with empty channels from every reachable configuration. While the properties 
above are naturally satisfied by our subset projection, the following ones can be 
checked directly on an implementable global type without explicitly construct- 
ing the implementation. A global type is terminating [36] iff it does not contain 
recursion and never-terminating [36] iff it does not contain term 0. 
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11 Related Work 


MSTs were introduced by Honda et al. [24] with a process algebra semantics, 
and the connection to CSMs was established soon afterwards [20]. 

In this work, we present a complete projection procedure for global types with 
sender-driven choice. The work by Castagna et al. [13] is the only one to present 
a projection that aims for completeness. Their semantic conditions, however, are 
not effectively computable and their notion of completeness is “less demanding 
than the classical ones” [13]. They consider multiple implementations, generating 
different sets of traces, to be sound and complete with regard to a single global 
type [13, Sec. 5.3]. In addition, the algorithmic version of their conditions does 
not use global information as our message availability analysis does. 

MST implementability relates to safe realizability of HMSCs, which is unde- 
cidable in general but decidable for certain classes [30]. Stutz [38] showed that 
implementability of global types that are always able to terminate is decidable. ! 
The EXPSPACE decision procedure is obtained via a reduction to safe realiz- 
ability of globally-cooperative HMSCs, by proving that the HMSC encoding [39] 
of any implementable global type is globally-cooperative and generalizing results 
for infinite executions. Thus, our PSPACE-completeness result both generalizes 
and tightens the earlier decidability result obtained in [38]. Stutz [38] also inves- 
tigates how HMSC techniques for safe realizability can be applied to the MST 
setting — using the formal connection between MST implementability and safe 
realizability of HMSCs — and establishes an undecidability result for a variant 
of MST implementability with a relaxed indistinguishability relation. 

Similar to the MST setting, there have been approaches in the HMSC liter- 
ature that tie branching to a role making a choice. We refer the reader to the 
work by Majumdar et al. [31] for a survey. 

Standard MST frameworks project a global type to a set of local types 
rather than a CSM. Local types are easily translated to FSMs [31, Def.11]. 
Our projection operator, though, can yield FSMs that cannot be expressed 
with the limited syntax of local types. Consider this implementable global type: 
p—q:0.0 + p-q:m.p—r:b.0. The subset projection for r has two final states 
connected by a transition labeled r<p?b. In the syntax of local types, 0 is the only 
term indicating termination, which means that final states with outgoing tran- 
sitions cannot be expressed. In contrast to the syntactic restrictions for global 
types, which are key to effective verification, we consider local types unneces- 
sarily restrictive. Usually, local implementations are type-checked against their 
local types and subtyping gives some implementation freedom [12,16,17,27]. 
However, one can also view our subset projection as a local specification of the 
actual implementation. We conjecture that subtyping would then amount to a 
variation of alternating refinement [5]. 

CSMs are Turing-powerful [11] but decidable classes were obtained for differ- 
ent semantics: restricted communication topology [33,42], half-duplex communi- 
cation (only for two roles) [14], input-bounded [10], and unreliable channels [2,3]. 


' This syntactic restriction is referred to as 0-reachability in [38]. 
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Global types (as well choreography automata [7]) can only express existentially 
1-bounded, 1-synchronizable and half-duplex communication [39]. Key to this 
result is that sending and receiving a message is specified atomically in a global 
type — a feature Dagnino et al. [19] waived for their deconfined global types. 
However, Dagnino et al. [19] use deconfined types to capture the behavior of a 
given system rather than projecting to obtain a system that generates specified 
behaviors. 

This work relies on reliable communication as is standard for MST frame- 
works. Work on fault-tolerant MST frameworks [8,43] attempts to relax this 
restriction. In the setting of reliable communication, both context-free [25,40] 
and parametric [15,22] versions of session types have been proposed to capture 
more expressive protocols and entire protocol families respectively. Extending 
our approach to these generalizations is an interesting direction for future work. 
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Abstract. Legal properties involve reasoning about data values and 
time. Metric first-order temporal logic (MFOTL) provides a rich formal- 
ism for specifying legal properties. While MFOTL has been successfully 
used for verifying legal properties over operational systems via runtime 
monitoring, no solution exists for MFOTL-based verification in early- 
stage system development captured by requirements. Given a legal prop- 
erty and system requirements, both formalized in MFOTL, the compli- 
ance of the property can be verified on the requirements via satisfia- 
bility checking. In this paper, we propose a practical, sound, and com- 
plete (within a given bound) satisfiability checking approach for MFOTL. 
The approach, based on satisfiability modulo theories (SMT), employs a 
counterexample-guided strategy to incrementally search for a satisfying 
solution. We implemented our approach using the Z3 SMT solver and 
evaluated it on five case studies spanning the healthcare, business admin- 
istration, banking and aviation domains. Our results indicate that our 
approach can efficiently determine whether legal properties of interest are 
met, or generate counterexamples that lead to compliance violations. 


1 Introduction 


Software systems, such as medical systems, are increasingly required to com- 
ply with laws and regulations aimed at ensuring safety, security, and data pri- 
vacy [1,36]. The properties stipulated by these laws and regulations — which 
we refer to as legal properties (LP) hereafter — typically involve reasoning about 
actions, ordering and time. As an example, consider the following LP, P1, derived 
from a health-data regulation (s.11, PHIPA [20]): “If personal health information 
is not accurate or not up-to-date, it should not be accessed”. In this property, 
the accuracy and the freshness of the data depend on how and when the data 
was collected and updated before being accessed. Specifically, this property con- 
strains the data action access to have accurate and up-to-date data values, which 
further constrains the order and time of access with respect to other data actions. 

System compliance with LPs can be checked on the system design or on 
an operational model of a system implementation. In this paper, we focus on 
the early stage, where one can check whether a formalization of the system 
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requirements satisfies an LP. The formalization can be done using a descriptive 
formalism like temporal logic [24,35]. For instance, the requirement (reqo) of 
a data collection system: “no data can be accessed prior to 15days after the 
data has been collected” needs to be formalized for verifying compliance of P1. 
It is important to formalize the data and time constraints of both the system 
requirements and LPs, such as the ones of P1 and reqo. 

Metric first-order temporal logic (MFOTL) enables the specification of data 
and time constraints [3] and has an expressive formalism for capturing LPs 
and the related system requirements that constrain data and time [1]. Existing 
work on MFOTL verification focuses on detecting violations at run-time through 
monitoring [1,19], with MFOTL formulas being checked on execution logs. There 
is an unmet need for determining the satisfiability of MFOTL specifications, i.e., 
looking for LP violations possible in MFOTL specification. This is important for 
designing systems that comply with their legal requirements. 

MFOTL satisfiability checking is generally undecidable since MFOTL is an 
extension of first-order logic (FOL). Restrictions are thus necessary for making 
the problem decidable. In this paper, we restrict ourselves to safety properties. 
For safety properties, LP violations are finite sequences of data actions, cap- 
tured via a finite-length counterexample. For example, a possible violation of 
P1 is a sequence consisting of storing a value v in a variable d, updating d’s 
value to v’, then reading d again and not obtaining v’. Since we are interested 
in finite counterexamples, bounded verification is a natural strategy to pursue 
for achieving decidability. SAT solvers have been previously used for bounded 
satisfiability checking of metric temporal logic (MTL) [24,35]. However, MTL 
cannot effectively capture quantified data constraints in LPs, hence the solution 
is not applicable directly. As an extension to MTL, MFOTL can effectively cap- 
ture data constraints used in LP. Yet, to the best of our knowledge, there has 
not been any prior work on bounded MFOTL satisfiability checking. 

To establish a bound in bounded verification, researchers have predominantly 
relied on bounding the size of the universe [13]. Bounding the universe would be 
too restrictive because LPs routinely refer to variables with large ranges, e.g., 
timed actions spanning several years. Instead, we bound the number of data 
actions in a run, which bounds the number of actions in the counterexample. 

Equipped with our proposed notion of a bound, we develop an incremental 
approach (IBS) for bounded satisfiability checking of MFOTL. We first trans- 
late the MFOTL property and requirements into first-order logic formulas with 
quantified relational objects (FOL*). We then incrementally ground the FOL* 
constraints to eliminate the quantifiers by considering an increasing number of 
relational objects. Subsequently, we check the satisfiability of the resulting con- 
straints using an SMT solver. Specifically, we make the following contributions: 
(1) we propose a translation of MFOTL formulas to FOL*; (2) we provide a 
novel bounded satisfiability checking solution, IBS, for the translated FOL” for- 
mulas with incremental and counterexample-guided over/ under-approximation. 
Note that while our solution to MFOTL satisfibility checking can be applied to 
a broader set of applications, in this paper we focus on the legal domain. We 
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P1 =O Vd, v(Access(d,v)) => (Vu! (v! 4 v > >Update(d, v') A —Collect(d, v'))) S (Update(d, v) V Collect(d, v))) 
If a personal health information is not accurate or not up-to-date, it should not be accessed. 

rego = O Vd, v(Access(d,v) => [360,) Sv’. Collect(d, v’) 

No data is allowed to be accessed before the data ID has been collected for at least 15 days (360 hours). 

req: = O Vd, v(Update(d,v) => -(f1,168) Sv’ .(Collect(d, v’) V Update(d, v’)))) 

Data value can only be updated after having been collected or last updated for more than a week (168 hours). 
req2 = O Vd, v(Access(d,v) => 0,163) Collect(d, v) V Update(d, v)) 

Data can only be accessed if has been collected or updated within a week (168 hours). 

reqs = O Vd, v(Collect(d,v) =» ~(3v”.(Collect(d, v”) Av £ v”) V 41) Sv’. Collect(d, v'))) No data re-collection. 


Fig. 1. Example requirements and legal property P1 of DCC, with signature Saata = 
(0,{ Collect, Update, Access}, taata), Where tgata (Collect) = taata( Update) = tdata (Access) = 2. 


Collect(0,0) Access(0,0) 


ls t data actions 
uF =0 7 = 361 5 
ae : time Collect2(0, 1), 
Collect(1,0) Collect(1,15) Collect(1,0) Access(1, 15) Update,(0,0) Access, (0, 1) 
a2 - o3t— T T > > 
To = 0 Tı = 384 T2 = 408 T2 = 432 7 =0 n=2 
Update;(0,0) Accessı(0,1) Collect2(0,1) Collectı(0,0) Access; (0, 1) 


omt — a ,; o: 
=O 1=1 5 =0 Tm =1 T2 = 2 


Fig. 2. Five traces from the DCC example. 


empirically evaluate IBS on five case studies with a total of 24 properties showing 
that it can effectively and efficiently find LP violations or prove satisfiability. 

The rest of this paper is organized as follows. Sect. 2 provides background and 
establishes our notation. Sect. 3 defines the bounded satisfiability checking (BSC) 
problem. Sect. 4 provides an overview of our solution and the translation of 
MFOTL to FOL*. Sect. 5 presents our solution; proofs of soundness, termination 
and optimality are available in the extended version [11]. Sect. 6 reports on the 
experiments performed to validate our bounded satisfiability checking solution 
for MFOTL. Sect. 7 discusses related work. Sect. 8 concludes the paper. 


2 Preliminaries 


In this section, we describe metric first-order temporal logic (MFOTL) [3]. 

Syntax. Let I be a set of non-empty intervals over N. An interval I € I can 
be expressed as [b,b’) where b € N and b € NU œ. A signature S is a tuple 
(C, R,), where C is a set of constants and R is a finite set of predicate symbols 
(for relation), respectively. Without loss of generality, we assume all constants 
are from the integer domain Z where the theory of linear integer arithmetic (LIA) 
holds. The function 1 : R — N associates each predicate symbol r € R with an 
arity u(r) € N. Let Var be a countable infinite set of variables from domain Z 
and a term t is defined inductively as t : c|v|t+t]|cx t. We denote t as a 
vector of terms and #,* as the vector that contains x at index k. The syntax of 
MFOTL formulas is defined as follows: (1) T and L, representing values “true” 
and “false”; (2)t=t' and t > t’, for terms t and t’; (3) r(t1...t,(r)) for r € R and 
terms t}...t.(r); (4) $ AY, 7d for MFOTL formulas ¢ and Y; (5) 3x.(r(t¥) A 4) 
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for MFOTL formula ¢, relation symbol r € R, variable x € Var and a vector of 
terms {Ë s.t. x = t*[k]; and (6) ¢ Ur w (until), @ Sr y (since), Or ¢ (next), @7 ¢ 
(previous) for MFOTL formulas ¢ and w, and an interval J € I. 

We consider a restricted form of quantification (syntax rule (5), above) simi- 
lar to guarded quantification [18]. Every existentially quantified variable x must 
be guarded by some relation r (i.e., for some t, r(t) holds and x appears in 
t). Similarly, universal quantification must be guarded as Vz.(r(t) = ¢) where 
x € t. Thus, 732.>7r(x) (and Vz.r(x)) are not allowed. 

The temporal operators Ur, Sr, @; and Oy; require the satisfaction of the 
formula within the time interval given by I. We write [b,) as a shorthand for 
[b, oo); if I is omitted, then the interval is assumed to be [0, co). Other classical 
unary temporal operators }; (eventually), Oz (always), and 4; (once) are defined 
as follows: 0; 6 = T Ur ġ, Or 6= 707 7d, and 4; ¢ = T Sr ¢. Other common 
logical operator such as V (disjunction) and V (universal quantification) are 
expressed through negation of A and J, respectively. 


Example 1. Suppose a data collection centre (DCC) collects and accesses 
personal data information with three requirements: reqo stating that no data 
is allowed to be accessed before the data ID has been collected for 15 days 
(360 hours); reqı: data can only be updated after having been collected or last 
updated for more than a week (168 hours); and req2: data value can only be 
accessed if the value has been collected or updated within a week (168 hours). 
The signature Sdata for DCC contains three binary relations (Raata): Collect, 
Update, and Access, such that Collect(d, v), Update(d, v) and Access(d, v) hold 
at a given time point if and only if data at id d is collected, updated, and accessed 
with value v at this time point, respectively. The MFOTL formulas for P1, rego, 
req, and reqz are shown in Fig. 1. For instance, the formula reqo specifies that if 
a data value stored at id d is accessed, then some data must have been collected 
and stored at id d at least 360 hours ago (/360,)])- 


Semantics. A first-order (FO) structure D over the signature S = (C, R,v) 
is comprised of a non-empty domain dom(D) Æ @ and an interpretation for 
cP € dom(D) and rP C dom(D)*“”) for each c € C and r € R. The semantics of 
MFOTL formulas is defined over a sequence of FO structures D = (Do, D1, ..-) 
and a sequence of natural numbers representing time 7 = (To, T1, ...), where (a) 
7 is a monotonically increasing sequence; (b) dom(D;) = dom( Di+1) for all i > 0 
(all D; have a fixed domain); and (c) each constant symbol c € C has the same 
interpretation across D (i.e., cP? = cP?+1), Property (a) ensures that time never 
decreases as the sequence progresses; and (b) ensures that the domain is fixed 
(referred to as dom(D)) D is similar to timed words in metric time logic (MTL), 
but instead of associating a set of propositions with each time point, MFOTL 
uses a structure D to interpret the symbols in the signature S. The semantics 
of MFOTL is defined over a trace of timed first-order structures o = (D,7), 
where every structure D; € D specifies the set of tuples (r?*) that hold for 


every relation r at time 7; € 7. Let (D,7) denote an MFOTL trace. 
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(D,7,v,1) EFt=t' iff u(t) = v(t’) 

(D,7,v,i) Et>t iff v(t) > v(t! 

(D; T, v,i) Er(ti,,tury) if r(v(ti), - vty) € r”i 

(D,7, v,i) = 7@ iff (D, 7, v,i) Fe 

(D, 7, v, i) H pay iff (D,7, v,i) = @ and (D,7, v,i) & 

(D,7, v,i) H Ir - (r(t£) A 6) iff (D, 7, vle > d], i) H (r(E£)) A ¢ for some d € dom(D) 

(D,7, v,i) EOr¢ iff (D,7,v,i +1) Kd and Tiy- ri €I 

(D,7, v,i) —-@r¢ iff i > 1 and (D,7,v,i-1) K¢ and 7-7-1 E€ I 

(D,7, v,i) = ọ Ur y iff exists 7 > i and (D,7,j,v) Ew and tj- ri €T 
and for all k EN i< k< j= (D,7, k, v) Ho 

(D,7, v,i) Eo Sr Y iff exists j < i and (D,7,j,v) Kw and Ti — Tj € I 
and for all k E N i > k > j > (D,7, k, v) Ke 


Fig. 3. MFOTL semantics. 


Example 2. Consider the signature Sdata in the DCC example. Let 7, = 0 and 
T2 = 361, and let Dı and Də be two first-order structures with r?! = Collect(0, 0) 
and r?2 = Access(0,0), respectively. The trace cı = ((D1, D2), (T1, T2)) is a 
valid trace shown in Fig.2 and representing two timed relations: (1) data value 
0 collected and stored at id 0 at hour 0 and (2) data value 0 is read by accessing 
id 0 at hour 361. 


A valuation function v : Var > dom(D) maps a set Var of variables to 
their interpretations in the domain dom(D). For vectors 7 = (21,...,0,) and 
d= (d1,...,dn) € doriti D)”, the update operation v[z — d] ue a new 
valuation function v’ s.t. v'(xi) = di tor 1 <i <n, and v(2’) = v'(x') for every 

x! ¢ z. For any constant c, v(c) = c?. Let D be a sequence of FO structures 
over signature S = (C,R,v) and 7 be a sequence of natural numbers. Let ¢ be 
an MFOTL formula over S, v be a valuation function and i € N. A fragment of 
the relation (D,7,v,i) | ¢ is defined in Fig. 3. 

The operators @;, Oz, Ur and S; are augmented with an interval J € I which 
defines the satisfaction of the formula within a time range specified by J relative 
to the current time at step i, i.e., Ti. 


Definition 1 (MFOTL Satisfiability). An MFOTL formula ġ is satisfiable 
if there exists a sequence of FO structures D and natural numbers 7, and a 
valuation function v such that (D,T,v,0) = ¢. ġ is unsatisfiable otherwise. 


Example 3. In the DCC example, the MFOTL formula rego is satisfiable 
because (D,7,v,0) H reqo (where o1 = (D,7) in Fig. 2). Let req) be another 
MFOTL formula: (0,359) 4j.(Access(0, j)). The formula rego A rego is unsatisfi- 
able because if data stored at id 0 is accessed between 0 and 359 hours, then it 
is impossible to collect the data at least 360 hours prior to its access. 


3 Bounded Satisfiability Checking Problem 


The satisfiability of MFOTL properties is generally undecidable since MFOTL is 
expressive enough to describe the blank tape problem [31] (which has been shown 
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to be undecidable). Despite the undecidability result, we can derive a bounded 
version of the problem, bounded satisfiability checking (BSC), for which a sound 
and complete decision procedure exists. When facing a hard instance for satisfi- 
ability checking, the solution to BSC provides bounded guarantees (i.e., whether 
a solution exists within a given bound). In this section, we first define satisfia- 
bility checking and then the BSC problem for MFOTL formulas. Satisfiability 
checking [32] is a verification technique that extends model checking by replacing 
a state transition system with a set of temporal logic formulas. In the following, 
we define satisfiability checking of MFOTL formulas. 


Definition 2 (Satisfiability Checking of MFOTL Formulas). Let P be 
an MFOTL formula over a signature S = (C,R,v), and let Regs be a set of 
MFOTL requirements over S. Reqs complies with P (denoted as Reqs > P) iff 
Aweregs Y A aP is unsatisfiable. We call a solution to Agere Y A AP, if one 
exists, a counterexample to Reqs => P. 


Example 4. Consider our DCC system requirements and the privacy data prop- 
erty P1 stating that if personal health information is not accurate or not up- 
to-date, it should not be accessed (see Fig. 1). P1 is not respected by the set 
of DCC requirements {reqo, reqi, req2} because —P1 A rego A req, A rege is 
satisfiable. The counterexample o2 (shown in Fig.2) indicates that data can 
be re-collected, and the re-collection does not have the same time restriction 
as the updates. If a fourth policy requirement reqs (Fig.1) is added to pro- 
hibit re-collection of collected data, then property P1 would be respected (i.e., 
{reqo, regi, req2, req3} => P1). 


Definition 3 (Finite trace and bounded trace). Given a trace o = 
(D,7,v), we use vol(c) (the volume of a), to denote the total number of times 
that any relation holds across all FO structures in D (i.e., rer 0p,ep(I"'|)): 
The trace o is finite if vol(c) is finite. The trace is bounded by volume vb € N 
if and only if vol(o) < vb. 


Example 5. The volume of trace o3 in Fig. 2, vol(a3) = 3 since there are three 
relations: Collect(1, 15), Update(1, 0), and Access(1, 15). Note that the volume 
is the total number of tuples that hold for any relation across all time points; 
multiple tuples can thus hold for multiple relations for a single time point. 


Definition 4 (Bounded satisfiability checking of MFOTL properties). 
Let P be an MFOTL property, Reqs be a set of MFOTL requirements, and vb be 
a natural number. The bounded satisfiability checking problem determines the 
existence of a counterexample o to Regs = P such that vol(a) < vb. 


4 Checking Bounded Satisfiability 


In this section, we present an overview of the bounded satisfiability checking 
(BSC) process that translates the MFOTL formula into first-order logic with 
relational objects (FOL*) formulas, and looks for a satisfying solution for the 
FOL* formulas. Then, we provide the translation of MFOTL formulas to FOL* 
and discuss the process complexity. 
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Fig. 4. Overview of the naive and our incremental (IBS) MFOTL bounded satisfiability 
checking approaches. Solid boxes and arrows are shared between the two approaches. 
Blue dashed arrow is specific to the naive approach. Red dotted arrows and the addi- 
tional red output in bracket are specific to IBS. (Color figure online) 
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4.1 Overview of BSC for MFOTL Formulas 


We aim to address the bounded satisfiability checking problem (Definition 4), 
looking for a satisfying run ø within a given volume bound vb that limits the 
number of relations in ø. First, we TRANSLATE the MFOTL formulas to FOL* 
formulas. The considered constraints in the formulas include those of the system 
requirements and the legal property, and optional data constraints specifying 
the data value constraint for a datatype. The data constraints can be defined as 
a range, a “small” data set, or the union/intersection of other data constraints. 
If data constraints are not specified, then the data value comes from the domain 
Z. Note that the optional data constraints do not affect the complexity of BSC, 
but they do help prune unrealistic counterexamples. Second, we SEARCH for a 
satisfying solution to the FOL* formula; an SMT solver is used here to determine 
the satisfiability of the FOL* constraints and the data domain constraints. The 
answer from the SMT solver is analyzed to return an answer to the satisfiability 
checking problem (a counterexample o, or” bounded-UNSAT”). 


4.2 Translation of MFOTL to First-Order Logic 


In this section, we describe the translation target FOL*, the translation rules 
and prove their correctness. 


FOL with Relational Object (FOL*). We start by introducing the syntax 
of FOL*. A signature S is a tuple (C, R,v), where C is a set of constants, R is a 
set of relation symbols, and 1: R — N is a function that maps a relation to its 
arity. We assume that the domain of constant C is Z, which matches the one for 
MFOTL, where the theory of linear integer arithmetic (LIA) holds. Let Var be a 
set of variables in the domain Z. A relational object o of class r € R (denoted as 
o: r) is an object with (r) regular attributes and two special attributes, where 
every attribute is a variable. We assume that all regular attributes are ordered 
and denote oļi] to be the ith attribute of o. Some attributes are named, and 0.x 
refers to o’s attribute with the name ‘xz’. Each relational object o has two special 
attributes o.ext and o.time. The former is a boolean variable indicating whether 
o exists in a solution, and the latter is a variable representing the occurrence time 
of o. For convenience, we define a function CLS(o) to return the relational object’s 
class. Let a FOL* term t be defined inductively ast: c |v | ofk] | o.x | t+t | cxt 
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for any constant c € C, any variable v € Var, any relational object o : r, any 
index k € [1,(r)] and any valid attribute name x. Given a signature S, the 
syntax of the FOL* formulas is defined as follows: (1) T and L, representing 
values “true” and “false”; (2)t=t' and t >t’, for term t and t’; (3) dy A vy, 
~os for FOL* formulas df and wy; (4) Jo: r- (df) for an FOL* formula ¢f and 
a class r; (5) Yo: r- (p) for an FOL* formula ¢, and a class r. The quantifiers 
for FOL* formulas are limited to relational objects, as shown by rules (4) & (5). 
Operators V and VY can be defined in FOL* as follows: df V Ys = =~(mos AF) 
and Vo: r- f = do: r-7@yz. We say an FOL* formula is in a negation normal 
form (NNF) if negations (~) do not appear in front of =, A, V, 4 and V. For the 
rest of the paper, we assume that every FOL* ¢ is in NNF. 

Given a signature S, a domain D is a finite set of relational objects. An 
FOL* formula grounded in the domain D (denoted by ¢p) is a quantifier-free 
FOL formula that eliminates quantifiers on relational objects using the following 
rules: (1) do: r- ($f) to Vo:rep(” -ext A flo — o']) and (2) Vo: r- (f) to 
Noren ext > oslo — o']). An FOL* formula p is satisfiable in D if there 
exists a variable assignment v that evaluates @p to T according to the standard 
semantics of FOL. An FOL* formula ¢, is satisfiable if there exists a finite 
domain D such that @y is satisfiable in D. We call o = (D,v) a satisfying 
solution to df, denoted as o = ¢y. Given a solution o = (D,v), we say a 
relational object o is in ø, denoted as o € ø, if o € D and v(o.ezt) is true. The 
volume of the solution, denoted as vol(c), is |{o| o € o}. 


Example 6. Let a be a relational object of class A with attribute name val. 
The formula Va : A. (da’ : A- (aval < aval) \ da: A- aval = 0) has 
no satisfying solutions in any finite domain. On the other hand, the formula 
Va: A- (da’,a”: A- (awal = a’.val+a".val) \da: A- a.val = 5) has a solution 
o = (D,v) of volume 2, with the domain D = (a1,a2) and the value function 
v(ay.val) = 5, v(ag.val) = 0 because if a — a, then the formula is satisfied by 
assigning a’ — a,, a” — az; and if a — ag, then the formula is satisfied by 
assigning a’ — ag, a” — az. 


~ 


From MFOTL Formulas to FOL” Formulas. We now discuss the translation 
rule from the MFOTL formulas to FOL* formulas. Recall that MFOTL semantics 
is defined for a time point i on a trace o = (D,7, v,i), where D = (Dj, Do,...) 
is a sequence of FO structures and 7 = (71,79,...) is a sequence of time values. 
The time value of the time point i is given by 7;, and if į is not specified, then 
i = 1. The semantics of the FOL* formulas is defined for a domain D where the 
information of time is associated with relational objects in the domain. There- 
fore, the time point i (and its time value 7;) should be considered during the 
translation from MFOTL to FOL* since the same MFOTL formula at different 
time points represents different constraints on the trace ø. Formally, our trans- 
lation function TRANSLATE, abbreviated as T, translates an MFOTL formula 
@ into a function f : T — f, where r E€ N and ġp is an FOL* formula. The 
translation rules are stated in Fig. 5. 
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Tt =#, 7) >t=ť 
TES tT) >t>tť 
T(r(ti, str) Ti) —>3o:r- he (0.j = tj) A(t = 0.time) 
T (79, 71) > ~T(¢, Ti) 
T(PAY%, Ti) > T(¢,71) AT(Y, Ti) 
T (aa -r(E&) A Q, Ti) > 30 : r- T((r (EÈ) A @) [x > ofk]], ri) 
T(Or ¢, Ti) —> do: TP - NEXT(o.time, Ti) A T(¢, 0.time) A (o.time — %) € I 
T(@; ¢,7:) > do: TP - PREv(o0.time, Ti) A T(¢, 0.time) A (Ti — 0.time) € I 
T(¢ Ur Y, Ti) > do: TP: (o.time > 7 A (o.time — ti) € I A T(ẹ,o0.time) 

and Vo’ : TP - o’.time - (T; < o'.time < o.time => T(¢, o'.time))) 
T(¢ Sr Y, Ti) > do: TP- (o.time < T; A (Ti — o.time) € IAT (a, 0.time) 

and Vo' : TP - (1; > o'.time > o.time => T(¢, o'.time))) 
T(¢) > T(¢,71) 


Fig. 5. Translation rules from MFOTL to FOL*. TP is an internal class of rela- 
tional objects used to represent time values at different time points. The predicate 
NEXT(t1, t2) (PREV(t1,t2)) asserts that tı is the next (previous) time value of t2. 


Representing time points in FOL*. Since FOL* quantifiers are limited to rela- 
tional objects, to quantify over time points (which is necessary to capture the 
semantics of MFOTL temporal operators such as U), the translated FOL* formu- 
las use a special internal class of relational objects TP (e.g., Jo: TP). Relational 
objects of class TP capture all possible time points in a trace, and they have 
two attributes, ext and time, to record the existence and the value of the time 
point, respectively. To ensure that every time value in a solution is represented 
by some relational object of TP, we introduce the time coverage FOL* axiom. 


Axiom 1 (Time coverage). Let ¢¢ be an FOL* formula and let ø be its solution. 
For every relational object o € ø, there exists an object o’ of class TP s.t. o and 
o’ share the same time value. Formally, Vo - (do’ : TP - o.time = o'.time). 


The translation of Or ¢ uses function NEXT(t4,t2) to assert that tı is the next 
time value of tg. Formally, NEXT(t1, t2) = Vo: TP - o.time > tg > tı < o.time. 
Function PREV(t, t2) for translation of @; ¢ is defined similarly. 


Definition 5 (Mapping from MFOTL trace to FOL* trace). Let an 
MFOTL trace (D,7) and a valuation function v be given. A function 
M((D,7),v) — (D,v’) is a mapping between an MFOTL trace and an FOL* 
trace if M satisfies the following rules: (1) for every Ti € 7, there exists a rela- 
tional object o: TP € D such that t; = v'(o.time); (2) for every structure 
D; € D, if a tuple t holds for a relation r, (i.e., € € r?*), then there exists a 
relational object o : r such that for j € (r), tj] = v'(olj]) and v'(o.time) = 
Ti \v'(o.ext) = T; (3) for every term t defined for v, v(t) = v' (T(t, 7;)). 


The inverse of M, denoted as M~t, is defined as follows: (1) 7 = 
SORT({v'(o.time) |o: TP € D-v ae ext)}) and on ) for every relational object 
o: r, if v'(o.ext), then (v’(o[1]) ...v’(o[u(r)])) € r?:, where i is the index of the 


time value v'(o.time) in 7. 


Early Verification of Legal Compliance via Bounded Satisfiability Checking 383 


Lemma 1. Given an MFOTL formula ¢, an MFOTL trace (D,7), a valuation 
function v, and a time point i, the relation (D,7,v,i) = @ holds iff there exists 
a satisfying trace o = (D, v’) for the formula T(¢,7;). 


Proof Sketch. In the proof, we use M and M71 (see Definition 5) to transform 
an MFOTL solution into an FOL* trace, and show that it is a solution to the 
translated FOL* formula (and vice versa). 

= : if (D,7,v,i) H 4, then it is sufficient to show (D,v’) — M(D,7,v) is 
an FOL* solution. To prove (D, v’) is the solution to T(¢, Ti), we consider all the 
translation rules in Fig. 5. The translated FOL* matches the semantics (Fig. 3) of 
MFOTL except for the translation of temporal operators (e.g., T(O; ¢, Ti) and 
T(¢ Ur v,7;)) where instead of quantifying over time points (e.g., 47 and Vk), 
internal relational objects of class TP (0,0’ : TP) are quantified over. By rule 
(1) of Dec. 5, every time point and its time value are mapped to some relational 
object of class TP. Therefore, the quantifiers on time points can be translated 
into the quantifiers on the relational objects of TP. The mapped solution (D, v’) 
also satisfies Axiom 1 because if a tuple ¢ holds for some relation r at some time 
T in the MFOTL trace (D,7), then there exists a time point i € [1, |7|] such that 
Ti = T. Therefore, by rule (1) of M, 7; is represented by some o : TP. 

<=: if (D,v’) E T(¢,7;), then it is sufficient to show that the MFOTL trace 
(D,7,v) — M7!(D,v’) satisfies @ at point i (i.e., (D,7,v,i) H| ¢). To prove 
(D,7,v,i)  @, we consider all the translation rules in Fig.5. The translated 
FOL* formula matches the semantics of MFOTL (Fig. 3) except for the difference 
between the time points and the relational objects of class TP. By Axiom 1, 
every relational object’s time is captured by some time point, and by rule (2) of 
M~1, every relational object is mapped onto some structure D; at some time 7; 
by M. Therefore, (D,7,v,7) H ¢ 


Theorem 1 (Translation Correctness). Given an MFOTL formula ¢ and an 
MFOTL trace o, let M(c) be the FOL* solution mapped from o using function M 
(Definition 5). Then (1) o = ¢ if and only if M(o) = T(¢@), and (2) vol(c) = 
vol(M(oc)) — |{o: TP € M(o)}|, where |{o : TP € M(o)}| is the number of 
relational objects of the internal class TP in the solution M(o). 


Proof. Statement (1) of Thm. 1 is a direct consequence of Lemma 1. Statement 
(2) is the result of rule (2) in Definition 5 because every relational object in 
the FOL* solution, except for the internal ones, i.e., o: TP, has a one-to-one 
correspondence to tuples that hold for some relation in the MFOTL solution. 

For the rest of the paper, we assume that the internal relational objects of 
class TP do not count toward the volume of the FOL’, i.e., vol(a) = vol(T(a)). 


Example 7. Consider a formula erp = O Vd-(A(d) => QJ5,10) B(d)), where A 
and B are unary relations. The translated FOL* formula T (exp) is: Vo : TP -Va : 
A-(o.time = a.time > do’: TP-b: B-o'.time = b.timeAa[1] = b[1]^ o.time+5 < 
o'.time < o.time + 10). Since o.time = a.time and o’.time = b.time, we can 
substitute o.time and o’.time with a.time and b.time in T(exp), respectively. 
Then, the formula contains no reference to o and o’, and we can safely drop 
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the quantified o and o’ (we can drop existential quantified TP relational object 
because of the time coverage axiom). The simplified formula is: Va : A - ib: 
B-all] = b[1] A a.time +5 < b.time < a.time + 10. 

This is important for designing system requirements that comply with LPs. 


Given an MFOTL property P and a set Reqs of MFOTL requirements, and 
a volume bound vb, the BSC problem can be solved by searching for a satisfying 
solution v’ for the FOL* formula T(-P) Ayeregs T(V) in a domain D with at 
most vb relational objects. 


4.3 Checking MFOTL Satisfiability: A Naive Approach 


Below, we define a naive procedure NBS (shown in Fig. 4) for checking satisfia- 
bility of MFOTL formulas translated into FOL*. We then discuss the complexity 
of this naive procedure. Even though we do not use NBS in this paper, its com- 
plexity constitutes an upper bound for our approach proposed in Sect. 5. 
Searching for a satisfying solution. Let ¢- be an FOL* formula translated 
from an MFOTL formula ¢, and let vb be the volume bound. NBS solves ¢+ 
via quantifier elimination. The number of relational objects in any satisfying 
solution of p should be at most vb. Therefore, NBS grounds the FOL* formulas 
within a domain of vb relational objects (see Sect. 4.2), and then uses an SMT 
solver to check satisfiability of the grounded formula. If the domain has multiple 
classes of relational objects, we can unify them by introducing a “superposition” 
class whose attributes are the union of the attributes of all classes and a special 
“name” attribute to indicate the class represented by the superposition. 
Complexity. The size of the quantifier-free formula is O(vb*), where k is the 
maximum depth of quantifier nesting. Since the background theory used in ¢ 
is restricted to linear integer arithmetic, solving the formula is NP-hard [29]. 
Because T (Tab. 5) is linear in the size of the formula ¢, NBS is NP-complete 
w.r.t. the size of the grounded formula, vb*. 


5 Incremental Search for Bounded Counterexamples 


The naive BSC approach (NBS) proposed in Sect. 4.3 is inefficient for solv- 
ing the translated FOL* formulas given a large bound n due to the size of the 
ground formula. Moreover, NBS cannot detect unbounded unsatisfiability, and 
cannot provide optimality guarantees on the volume of counterexamples which 
are important for establishing the proof of unbounded correctness and localiz- 
ing faults [15], respectively. In this section, we propose an incremental procedure 
IBS, which can detect unbounded unsatisfiability and provide the shortest coun- 
terexamples. An overview of IBS is given in Fig. 4. 

IBS maintains an under-approximation of the search domain and the FOL* 
constraints. It uses the search domain to ground the FOL* constraints, and an 
SMT solver to determine the satisfiability of the grounded constraints. It ana- 
lyzes the SMT result and accordingly either expands the search domain, refines 
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the FOL* constraints, or returns an answer to the satisfiability checking problem 
(a counterexample g, “bounded-UNSAT”, or “UNSAT”). The procedure contin- 
ues until an answer is obtained (ø or UNSAT), or until the domain exceeds the 
bound vb, in which case a “bounded-UNSAT” answer is returned. 

In the following, we describe IBS in more detail. We explain the key compo- 
nent of IBS, computing over- and under-approximation queries, in Sect. 5.1. We 
discuss the algorithm itself in Sect. 5.2 and illustrate it in Sect. 5.3. We prove its 
soundness, completeness, and solution optimality in the extended version [11]. 


5.1 Over- and Under-Approximation 


NBS grounds the input FOL* formulas in a fixed domain D (fixed by the bound 
vb). Instead, IBS under-approximates D to D, such that D) C D. With D}, 
we can create an over- and an under-approximation query to the bounded sat- 
isfiability checking problem. Such queries are used to check the satisfiability of 
FOL* formulas with domain D,. IBS starts with a small domain D}, and grad- 
ually expands it until either SAT or UNSAT is returned, or the domain size 
exceeds some limit (bounded-UNSAT). 


Over-approximation. Let ġ be an FOL* formula, and D be a domain of rela- 
tion objects. The procedure GROUND, G(ġf, D,), encodes ¢, into a quantifier- 
free FOL formula ¢, s.t. the unsatisfiability of ¢, implies the unsatisfiability of 
gy. We call ¢, an over-approximation of ġf. The procedure G (Algorithm 2) 
recursively traverses the syntax tree of the input FOL* formula from top to 
bottom. 

To eliminate the existential quantifier in Jo : r - ø, (L:1), G creates a new 
relational object o’ of class r (L: 2), and replaces o with o’ in f, (L:3). To 
eliminate the universal quantifier in Vo : r- by (L: 4), G grounds the formula 
in D,. More specifically, G expands the quantifier into a conjunction of clauses 
where each clause is o’.ext > by [o — o'] (i-e., o is replaced by o’ in o'r) for each 
relational object o' of class r in D, (L: 5). Intuitively, an existentially quantified 
relational object is instantiated with a new relational object, and a universally 
quantified relational object is instantiated with every existing relational object 
of the same class in D}, which does not include the ones instantiated during G. 


Lemma 2 (Over-approximation Query). For an FOL* formula df, and a 
domain D, if dg = G(¢s,D,) is UNSAT, then so is of. 


Under-Approximation. Let ¢ be an FOL* formula, and D; be a domain. 
The over-approximation ¢, = G(¢y, D) contains a set of new relational objects 
introduced by G (L:2), denoted by NewRs. Let NONEWR(NewRs, D) be 
constraints that enforce that every new relational object 0; in NewRs be 
semantically equivalent to some relational objects og in D}. Formally: the 
predicate NONEWR(NewRs, D,) is defined as A, cnewrs Vosen, (01 = 02), 
where the semantically equivalent relation between 0; and o, (i.e., 01 = 02) 
is defined as CLS(01) = CLS(o2) and ALIS) (o,[i] = osli) A ovext = 
02.ext A 0,.time = 02.time (where the CLS(o) returns the class of o). Let 
by = ġ \ NONEwWR(NewRs, D). If by has a satisfying solution, then there 
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is a solution for ¢f. We call oy an under-approximation of @¢ and denote the 
procedure for computing it by UNDERAPPROX(¢,;, D1). 


Lemma 3 (Under-Approximation Query). For an FOL* formula os, and 
a domain D}, let bg = G(¢s,D,) and p} = UNDERAPPROX(¢;, D|). If o is a 


solution to by then there exists a solution to $f. 


Algorithm 1. IBS: search for a bounded (by vb) solution to T(>P) Aye regs T(V). 
Input an MFOTL formula =P, and MFOTL requirements Reqs = {w1, qo,..-} - 


Optional Input vb, the volume bound, and data constraints Tgata- 
Output a counterexample o, UNSAT or bounded-UNSAT. 


1: Reqs, — { Ys =T(w) | Y € Regs} 12: if o = UNSAT then //expand D, 
2: =Pf — T(-P) 13: Omin — MINIMIZE(¢y) 
3: Reqs, 0 //initially empty requirement is //expand based on Omin 
a D, — 0 //initially empty domain 16: ie Westie ia 

: while T d i min 
6: " b. = age A Regs 17: return bounded-UNSAT 

: f l 18: else //check all requirements 
T: b = G(¢,,D.) //over-approz. 1 if o = wy for wy € Reqs; then 
8: $g < UNDERAPPROX($),D,) //under- 90; retirat 

approm: 21: else 

9: if SOLVE(¢, A Taata) = UNSAT then 22: lesson — wy for some o A wf 
10: return UNSAT 23: Reqs, .add(lesson) 
11: o — SoLvE(¢, A Taata) 


Algorithm 2. G: ground a NNF FOL* formula ¢; in a domain D}. 
Input an FOL* formula ¢¢ in NNF, and a domain of relational objects D} . 


Output a grounded quantifier-free formula ¢g over relational objects. 


: if match (df, do: r- pr) then //process the existential operator 
o' + NEWACT(r) //create a new relational object of class r 
return o’.ert AG (|o — o'], Dy) 

: if match (¢7, Vo: r- t) then //process the universal operator 
o’ .ext > G (¢;[0 — 0], D,) 


return Niolirie Dy 


: if match (oF, $; op Yh where op = A | V) then return G(¢),, D,) op G(Y4, Dy) 
: return ¢7 //case where $+ is quantifier-free, including =o; where o'r is atomic (NNF) 


The proofs of Lemma 2 and 3 are in the extended version [11]. 
Suppose, for some domain D}, that an over-approximation query ¢, for an FOL* 
formula @¢, is satisfiable while the under-approximation query 5 is UNSAT. 
Then, the solution to ¢, provides hints on how to expand D, to potentially 
obtain a satisfying solution for @¢, as captured in Corollary 1. 


Corollary 1 (Necessary relational objects). For an FOL* formula o; and 
a domain D,, let dg and by be the over- and under-approzimation queries of 
gf based on D|, respectively. Suppose dg is satisfiable and by is UNSAT, then 
every solution to of contains some relational object in formula dg but not in D}. 
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5.2 Counterexample-Guided Constraint Solving Algorithm 


Let an MFOTL formula —P (to find a satisfiable counterexample to P), a set of 
MFOTL requirements Regs, an optional volume bound vb, and optionally a set 
of FOL* data domain constraints Tyata be given. IBS, shown in Algorithm 1, 
searches for a solution ø to ~P A Nye Regs w (with respect to Tyata) bounded by 
vb, as a counter-example to \ycregs = P (Definition 2). bounded by vb. If 
no such solution is possible regardless of the bound, IBS returns UNSAT. If no 
solution can be found within the given bound, but a solution may exist for a 
larger bound, then IBS returns bounded-UNSAT. If vb is not specified, IBS will 
perform the search unboundedly until a solution or UNSAT is returned. 

IBS first translates ~P and every y € Regs into FOL* formulas in Regs, 
denoted by =Py and Yp, respectively. Then IBS searches for a satisfying solution 
to aPr A Nore Reqs; pp in the domain D of volume, which is at most vb. Instead 
of searching in D directly, IBS searches for a solution to =~Pẹ A Nyre Regs, wer in 
D, (denoted by ¢) where Regs, C Reqs; and D, C D. IBS initializes Regs; and 
D, as empty sets (LL:3-4). Then, for the FOL* formula ¢,, IBS creates an over- 
and under-approximation query g (L:7) and $7 (L:8), respectively (described 
in Sect. 5.1). IBS first solves the over-approximation query ¢, by querying an 
SMT solver (L:9). If ¢, is unsatisfiable, then ¢, is unsatisfiable (Lemma 2), and 
IBS returns UNSAT (L:10). 

If dg is satisfiable, then IBS solves the under-approximation query o7 (L:11). 
If by is unsatisfiable, then the current domain D, is too small, and IBS expands 
it (LL:12-18). This is because the satisfiability of ¢, indicates the possibility of 
finding a satisfying solution after adding at least one of the new relational objects 
in the solution to ¢, to D} (Corollary 1). The domain D, is expanded by adding 
all relational objects o’ in the minimum (in terms of volume) solution Omin 
to g (1:13). To obtain Gmin, we follow MaxRes [28] methods: we analyze the 
UNSAT core of by and incrementally weaken oy towards dy (i.e., the weakened 
query ot is an “over-under approximation” that satisfies by > ot => dq) 
until a satisfying solution Omin is obtained for the weakened query. However, if 
the volume of omin exceeds vb (L:16), then bounded-UNSAT is returned (L:17). 
UNSAT core-guided domain expansion has also been explored for unfolding the 
definition of recursive functions [30,37]. 

On the other hand, if oy yields a solution ø, then ø is checked on Regs 
(L:19). If o satisfies every Yp in Regsp, then o is returned (L:20). If ø violates 
some requirements in Regsr, then the violating requirement lesson is added to 
Reqs, to be considered in the search for the next solutions (L:23). 

If IBS does not find a solution or does not return UNSAT, it means that 
no solution is found because D, is too small or Regs, are too weak. IBS then 
restarts with the expanded domain D}, or the refined set of requirements Reqs}. 
It computes the over- and under-approximation queries (¢, and 5) again, and 
repeats the steps. See Sect. 5.3 for an illustration of IBS. 


Remark 1. IBS finds the optimal solution because it looks for the minimum 
solution Gmin to the over-approximation query ¢g (L:13) and uses it for domain 
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expansion (L:15). However, looking for omin adds cost. If solution optimality is 
not required, IBS can be configured to heuristically find a solution ø to g such 
that vol(o) < vb. The greedy best-first search (gBFS) finds a solution to ¢, that 
minimizes the number of relational objects that are not already in D}, and then 
uses it to expand D,. We configured a non-optimal version of IBS (nop) that 
uses gBFS heuristics and evaluated its performance in Sect. 6. 


5.3 Illustration of IBS 


Suppose a data collection centre (DCC) collects and accesses personal data infor- 
mation with two requirements: req,: data value can only be updated after having 
been collected or last updated for more than a week (168 hours); and reqs: data 
can only be accessed if has been collected or updated within a week (168 hours). 
The signature Sdata for DCC contains three binary relations (Raata): Collect, 
Update, and Access, such that Collect(d, v), Update(d, v) and Access(d, v) hold 
at a given time point if and only if data at ID d is collected, updated, and accessed 
with value v at this time point, respectively. The MFOTL formulas for P1, req, 
and req, are shown in Fig. 1. Suppose IBS is invoked to find a counterexample 
for property P1 (shown in Fig. 1) subject to requirements Regs = {req,, reqa} 
with the bound vb = 4. IBS translates the requirements and the property to 
FOL* and initializes Regs, and D, to empty sets. For each iteration, we use dg 
and 5 to represent the over- and under-approximation queries computed on 
LL:7-8, respectively. 


Ist iteration: D; = 0 and Reqs, = 0. Three new relational objects are intro- 
duced to ¢, (due to —P1): access), collect;, and update, such that: (C1) access, 
occurs after collect; and update,;(C2) access;.d = collect,.d = update, .d;(C3) 
access,.v # collect,.v ^ access;.v # update,.v; and (C4) either collect, or update, 
must be in the solution. dg is satisfiable, but 5 is UNSAT since D, is an empty 
set. We assume D} is expanded by adding access; and update,. 


2nd iteration: D; = {accessı, update;} and Regs; = Ø. The over- 
approximation @, stays the same, but oy becomes satisfiable since access, and 
update, are in D|. Suppose the solution is 04 (see Fig. 2). However, o4 violates 
reqz, SO req is added to Regs). 


3rd iteration: D) = {access,, update,} and Reqs) = {req}. Two new rela- 
tional objects are introduced in ġg (due to req): collectz and update such that 
(C5) collecty.time < access;.time < collectr.time + 168; (C6) update,.time < 
access;.time < update,.time + 168; (C7) access;.d = collectz.d = update,.d; (C8) 
access,.v = collectz.v = update,.v; and (C9) collectz or update, is in the solu- 
tion. The new @, is satisfiable, but by is UNSAT because update, ¢ D, and 
update, # update, (C8 conflicts with C3). Therefore, D, needs to be expanded. 
Assume collectz is added to D}. 


4th iteration: D, = {access,, update,, collectz} and Reqs, = {req}. The over- 
approximation @, stays the same, but oy becomes satisfiable since collect is in 
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D,. Suppose the solution is 03 (see Fig. 2). Since ø violates req}, req, is added 
to Reqs. 


5th iteration: D; = {access1, update, , collecty} and Regs, = {req,, reqa}. The 
following constraints are added to dg (due to req, ): (C9) a(updates.time — 168 < 
collect, .time < update,.time). Since (C9) conflicts with (C8), (C7) and (C1), 
update, cannot be in the solution to @,. The over-approximation ¢, is satisfiable 
if collect, (introduced in the 1st iteration) or update, (3rd iteration) are in the 
solution. However, by is UNSAT since D, does not contain collect, or update. 
Thus, D} is expanded. Assume update, is added to D}. 


6th iteration: D, = {access1, update,, collectz, update,}, Reqs, = {req,, req}. 
The following constraints are added to ¢, (C10) update).time > update, .time+ 
168 (due to req,) and (C11) update,.time < update,.time (due to ~P). Since 
(C10) conflicts with (C11), update, cannot be in the solution to ¢,. Thus, ¢, 
is satisfiable only if collect, is in the solution. However, ¢+ is UNSAT because 


g 
collect; ¢ D,. Therefore, D, is expanded by adding collect. 


final iteration: D) = {access,, update,, collectz, update, collecti} and 
Reqs, = {req,,Teq,}. The under-approximation oF becomes satisfiable, and 
yields the solution o5 in Fig. 2 which satisfies both req, and reqs. 


6 Evaluation 


To evaluate our approach, we developed a prototype tool, called LEGOS, that 
implements our MFOTL bounded satisfiability checking algorithm, IBS (Algo- 
rithm 1). It includes Python API for specifying system requirements and MFOTL 
safety properties. We use pySMT [14] to formulate SMT queries and Z3 [8] to 
check their satisfiability. The implementation and the evaluation artifacts are 
included in the supplementary material [12]. In this section, we evaluate the effec- 
tiveness of our approach using five case studies, aiming to answer the following 
research question: How effective is our approach at determining the bounded sat- 
isfiability of MFOTL formulas? We measure effectiveness in terms of the ability 
to determine satisfiability (i.e., the satisfying solution and its volume, UNSAT, 
or bounded UNSAT), and performance, i.e., time and memory usage. 

Cases studies. The five case studies considered in this paper are summarized 
below: (1) PHIM (derived from [1,10]): a computer system for keeping track of 
personal health information with cost management; (2) CF@H!: a system for 
monitoring COVID patients at home and enabling doctors to monitor patient 
data; (3) PBC [4]: an approval policy for publishing business reports within a 
company; (4) BST [4]: a banking system that processes customer transactions; 
and (5) NASA [26]: an automated air-traffic control system design that aims to 
avoid aircraft collisions.” Table 1 gives their statistics. For each case study, we 


1 https://covidfreeathome.org/. 
? The requirements and properties for the NASA case study are originally expressed 
in LTL, which is subsumed by MFOTL. 


390 N. Feng et al. 


record the number of requirements, relations, relation arguments, and properties, 
denoted as #reqs, #rels, #args, and #props, respectively. Additionally, Table 1 
shows initial configurations used in our experiments, with number of custodians 
(#c), patients (#p), and data (#d) for PHIM; number of users (#u), and data 
(#d) for CF@H and PBC; number of employees (#e), customers (#c), transac- 
tions (#t), and the maximum amount for a transaction (sup) for BST; number 
of ground-separated (#GSEP) and of the self-separating aircraft (#SSEP) for 
NASA. 


Table 1. Case study statistics. Table 2. Performance comparison between IBS 
and nuXmy on case study NASA. 
Names| Case study statistics Configuration 
sm ual = ae nr Lape NASA configuration 1 configuration 2 
#d= 5 IBS nuXmv IBS nuXmv 
CFOH|45 28 P-a] |7 #u=2, #d= 10 out. [time [mem |out. [time |mem jout. |time |mem |out. [time [mem 
PBC |i4 7 m=2] fi #u=5, #d= 10 (sec) |(MB) (sec) |(MB) (sec) |(MB) (sec) |(MB) 
BST [10 3 {1 — 3] |3 #e=1, #c=2 na, |U |0.80|154 |U |0.88 |82 U [0.13 |141 |U {1.65 |90 
#t= 4, sup = 10 naz |U [0.16141 |U [0.47 |70 U [0.15141 |U {1.50 |90 
NASA |194 [10 [6 — 79]|6 #GSEP =3 naz |U [0.16141 |U [0.49 |83 U [0.13141 |U [1.48 |90 
#SSEP =0 naa [U (0.77 |80 [U [0.54[|83 JU [0.15]66 |U [1.43 J91 
#GSEP =2 nas |U |0.14/140 |U 0.52 |82 U [0.15141 |U 41.43 |90 
#SSEP =2 nas |U [0.0362 |U [05772 |u [0.0362 JU [1.40 [90 


Case studies were selected for (i) the purpose of comparison with existing 
works (i.e., NASA); (ii) checking whether our approach scales with case studies 
involving data/time constraints (PBC, BST, PHIM and CF@H); or (iii) eval- 
uating the applicability of our approach with real-word case studies (CF@H 
and NASA). In addition to prior case studies, we include PHIM and CFQH 
which have complex data/time constraints. The number of requirements for the 
five case studies ranges between ten (BST) and 194 (NASA). The number of 
relations present in the MFOTL requirements ranges from three (BST) to 28 
(CF@H), and the number of arguments in these relations ranges from 1 (PHM, 
PBC, and BST) to 79 (NASA). 

Experimental setup. Given a set of requirements, data constraints and prop- 
erties of interest for each case study, we measured the run-time (time) and peak 
memory usage (mem.) of performing bounded satisfiability checking of MFOTL 
properties, and the volume vol, (the number of relational objects) of the solution 
(o) with (op) and without (nop) the optimality guarantees (see Remark 1 for 
finding non-optimal solutions). We conduct two experiments: the first one evalu- 
ates the efficiency and scalability of our approach; the second one compares our 
approach with satisfiability checking. Since there is no existing work for check- 
ing MFOTL satisfiability, we compared with LTL satisfiability checking because 
MFOTL subsumes LTL. To study the scalability of our approach, our first exper- 
iment considers four different configurations obtained by increasing the data 
constraints of the case-study requirements. The initial configuration (small) is 
described in Table 1 and the initial bound is 10. The medium and large configura- 
tions are obtained by multiplying the initial data constraints and volume bound 


Early Verification of Legal Compliance via Bounded Satisfiability Checking 391 


Table 3. Run-time performance for four case studies and 18 properties. We record 
the outcome (out.) of the algorithm with (op) or without (nop) the optimal solution 
guarantee: UNSAT (U), bounded-UNSAT (b-U), or the volume of the counterexample 
o (a natural number, corresponding to vole). We consider four different configura- 
tions: small (see Tab. 6), medium (x10), big (x100), and unbounded (co) data domain 
constraints and volume bound. Volume differences between op and nop are bolded. 


case studies small medium big unbounded 
out. Time Mem Jout. | Time Mem | out. Time Mem Jout. | Time Mem 
(sec) (MB) (sec) (MB) (sec) (MB) (sec) (MB) 
nop | op|nop | op  |nop | op |nop | op nop | op nop | op |nop | op nop | op nop | op | nop | op nop | op nop | op 
PHIM phi U 0.04 | 0.03/29] 29 |U 0.03 | 0.03 136 | 136| U 0.04 [0.04 |136 | 136 |U 0.06 | 0.05 64 | 64 
phy |U 0.03 | 0.03 | 138 | 138 |U 0.03 | 0.03 136 | 137| U 0.03 | 0.04 1136 | 136 |U 0.05 | 0.06 64 | 61 
pha |U 0.03 | 0.03 | 134 | 137 U 0.03 | 0.03 138 | 138| U 0.05 | 0.05 |137 | 138 |U 0.06 | 0.06 64 | 64 
pha U 0.04 | 0.04 | 136 | 138 |U 0.04 | 0.04 138 | 135| U 0.05 | 0.05 |138 | 138 |U 0.06 | 0.07 64 | 64 
phs U 0.02 | 0.02 |135 | 135 | U 0.02 | 0.02 608 | 608/56 | 56 |30.51 | 30.51 |390 | 390 156 |56 | 21.64 | 21.60 |393 | 390 
phe|b-U 0.18 | 0.20 |139 | 139| U 0.72 | 0.82 144 | 144| U 0.88 | 0.70 [142 |142 |U 0.91 | 0.91 70 | 70 
phr U 0.11 | 0.11 | 139 | 139/29 | 29 13.80 | 1905.40 | 193 | 599| 30 | 29 | 20.25 | 682.22 | 193 | 601 | 32 | 29 | 20.96 | 1035.87 | 123 | 383 
CFGH cf, b-U (4.80 | 6.90/ 114 | 176 |U 2.87 | 3.55 81/86 U 298| 1.71  |85|76 |U 1.71 | 0.74 74 | 68 
cf |b-U 0.87 | 0.93/70] 70 [14] 14 [3.21] 425.41 |79|334 14|14 [2.40] 778.36 |76|80 |14|14 13.32 |16.97 | 80| 205 
cfs b-U 1.38 | 1.31 |145 | 145|16| 16 (6.05 | 90.78 |168 | 403/16 |16 | 3.54 | 371.65 |157 |846|16 |16 |5.35 | 24.07 |86 | 164 
cfa b-U 1.52 |0.73|74|68 |14|14 [4.54] 65.59 |90|261 |14|14 [5.63/57.30 |95|261 |14|14 |5.65 | 1227.02 |89 | 294 
cfs |8|8 (1.20 |1.17|146 | 147/8|8 [0.48 [0.54 141 |142|8|8  [0.69/0.57 141|141|8|8 10.72 | 0.76 69 | 69 
cfs |8|8 |106 |1.16|146 |14718]8 [0.52 [0.61 142| 142|8|8 [0.60/0.73 141]141|8|8 (0.72 | 0.72 69 | 69 
ch |U 0.58 | 0.58 |141 | 142 |U 0.38 | 0.36 140 | 141 U 0.47 | 0.44 T140 | 141 /U 0.30 | 0.34 66 | 67 
PBC ph, |U 0.04 | 0.04/29 | 140 |U 0.16 | 0.17 140| 1399]9  (0.28]0.29 [141] 141/9]9 10.27 | 0.28 67 | 67 
BST bs, U 0.04 | 0.03/64] 63 |U 0.29 | 0.24 70/68 |U 031/030 69/68 IU 0.25 | 0.25 69 | 69 
bso |2|2 (0.04 | 0.04/62|64 |2]2 [0.04] 0.04 62/62 |2|2 [004/004 [64/64 (2/2 [0.04] 0.04 64] 64 
bsa [U 0.02 | 0.02/62] 62 [5/5 |04|09 70/73 (5/5 [039/085 |70/74 [5/5  |0.40|0.70 70 | 72 


by ten and hundred, respectively. The last (unbounded) configuration does not 
bound either the data domain or the volume. As we noted earlier in Sect. 4, the 
purpose of adding data constraints is to avoid unrealistic counterexamples. For 
example, the NASA case study uses a data set for specifying the possible system 
control modes and uses data ranges to restrict the possible measures from the 
aircraft (e.g., aircraft’s trajectory). In the other case studies, data constraints are 
realistic data ranges (e.g., a patient’s account balance should be non-negative). 
To study the performance of our approach relative to existing work, our second 
experiment considers two configurations of the NASA case study verified in [24] 
using the state-of-the-art symbolic model checker nuXmv [6]?. We compare our 
approach’s result against the reproduced result of nuXmv verification. For both 
experiments, we report the analysis outcomes, i.e., the volume of the satisfy- 
ing solution (if one exists), UNSAT, or bounded UNSAT; and performance, i.e., 
time and memory usage. The experiments were conducted using a ThinkPad X1 
Carbon with an Intel Core i7 1.80 GHz processor, 8 GB of RAM, and running 
64-bit Ubuntu GNU/Linux 8. 

Results of the first experiment are summarized in Table 3. Out of the 72 
trials, our approach found 31 solutions. It also returned five bounded-UNSAT 
answers, and 36 UNSAT answers. The results show that our approach is effec- 
tive in checking satisfiability of case studies with different sizes. More precisely, 


3 LEGOS solved all configurations from the NASA case study; see the results in [12]. 
For comparison, we report only on the configurations that are explicitly supported 
by nuXmv. 
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we observe that it takes under three seconds to return UNSAT and between 
.04 seconds (bsg:medium) and 32min (ph7:medium:op) to return a solution. In 
the worst case, op took 32 min for checking phy where the property and require- 
ments contain complex constraints. Effectively, ph7 requires the deletion of data 
stored at id 10, while the cost of deletion increases over time under PHIM’s 
requirements. Therefore, the user has to perform a number of actions to obtain 
a sufficient balance to delete the data. Additionally, each action that increases 
the user’s balance has its own preconditions, effects, and time cost, making the 
process of choosing the sequence of actions to meet the increasing deletion cost 
non-trivial. 

We can see a difference in time between cf2 ‘big’ and ‘unbounded’, this is 
because the domain expansion followed two different paths and one produces sig- 
nificantly easier SMT queries. Since our approach is guided by counterexamples 
(i.e., the path is guided by the solution from the SMT solver (Algorithm1-L:13)), 
our approach does not have direct control over the exact path selection. In future 
work, we aim to add optimizations to avoid/backtrack from hard paths. 

We observe that the data-domain constraint and volume bound used in dif- 
ferent configurations do not affect the performance of IBS when the satisfiability 
of the instances does not depend on them, which is the case for all the instances 
except for phg_7:small, cf;_3:small, and bs3:small. As mentioned in Sect. 4, the 
data-domain constraint ensures that satisfying solutions have realistic data val- 
ues. For phl—ph4, the bound used in the small, medium and large configurations 
creates additional constraints in the SMT queries for each relational object, and 
therefore results in a larger peak memory than the unbounded configuration. 

Finding the optimal solution (by op), in contrast to finding a satisfying solu- 
tion without the optimal guarantee (by nop), imposes a substantial computa- 
tional cost while rarely achieving a volume reduction. The non-optimal heuristic 
nop often outperformed the optimal approach for satisfiable instances. Out of 
31 satisfiable instances, nop solved 12 instances 3 times faster, 10 instances 10 
times faster and seven instances 20 times faster than op. Compared to the non- 
optimal solution, the optimal solution reduced the volume for only two instances: 
ph7:large and ph7:unbounded by one (3%) and three (9%), respectively. On all 
other satisfying instances, op and nop both find the optimal solutions. When 
there is no solution, both op and nop are equally efficient. 

Results of the second experiment are summarized in Table 2. Our approach 
and nuXmv both correctly verified that all six properties were UNSAT in both 
NASA configurations. We observe that the performance of our approach is com- 
parable to nuXmv for the first configuration with .10 to .20 seconds of difference 
on average. Yet, for the second configuration, our approach terminates in less than 
0.20 seconds and nuXmv takes 1.50 seconds on average. We conclude that our app- 
roach’s performance is comparable to that of nuXmv for LTL satisfiability check- 
ing even though our approach is not specifically designed for LTL. 

Summary. In summary, we have demonstrated that our approach is effective 
at determining the bounded satisfiability of MFOTL formulas using case studies 
with different sizes and from different application domains. When restricted to 
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LTL, our approach is at least as effective as the existing work on LTL satisfiabil- 
ity checking which uses a state-of-the-art symbolic model checker. Importantly, 
IBS can often determine satisfiability of instances without reaching the volume 
bound, and its performance is not sensitive to the data domain. On the other 
hand, IBS’s optimal guarantee imposes a substantial computational cost while 
rarely achieving a volume reduction over non-optimal solutions obtained by nop. 
We need to investigate the trade-off between optimality and efficiency, as well 
as evaluate the performance of IBS on a broader range of benchmarks. 


7 Related Work 


Below, we compare with the existing approaches that address the satisfiability 
checking of temporal logic and first-order logic. 

Satisfiability checking of temporal properties. Temporal logic satisfiabil- 
ity checking has been studied for the verification of system designs. Satisfiability 
checking for Linear Temporal Logic (LTL) can be performed by reducing the 
problem to model checking [35], by applying automata-based techniques [25], 
or by SAT solving [5,21—23]. Satisfiability checking for metric temporal logic 
(MTL) [32] and its variants, e.g., mission-time LTL [24] and signal temporal 
logic [2], has been studied for the verification of real-time system designs. These 
existing techniques are inadequate for our needs: LTL and MTL cannot effec- 
tively capture quantified data constraints commonly used in legal properties. 
MFOTL does not have such a limitation as it extends MTL and LTL with first- 
order quantifiers, thereby supporting the specification of data constraints. 
Finite model finding for first-order logic. Finite-model finders [7,33] look 
for a model by checking universal quantifiers exhaustively over candidate models 
with progressively larger domains; we look for finite-volume solutions using a sim- 
ilar approach. On the other hand, we consider an explicit bound on the volume 
of the solution, and are able to find the solution with the smallest volume. SMT 
solvers support quantifiers with quantifier instantiation heuristics [16, 17] such as 
E-matching [9,27] and conflict-based instantiation [34]. Quantifier instantiation 
heuristics are nonetheless generally incomplete, whereas, in our approach, we 
obtain completeness by bounding the volume of the satisfying solution. 


8 Conclusion 


In this paper, we proposed an incremental bounded satisfiability checking app- 
roach, called IBS, aimed to enable verification of legal properties, expressed in 
MFOTL, against system requirements. IBS first translates MFOTL formulas to 
first-order logic with relational objects (FOL*) and then searches for a satis- 
fying solution to the translated FOL* formulas in a bounded search space by 
deriving over- and under-approximating SMT queries. IBS starts with a small 
search space and incrementally expands it until an answer is returned or until the 
bound is exceeded. We implemented IBS on top of the SMT solver Z3. Experi- 
ments using five case studies showed that our approach is effective for identifying 
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errors in requirements from different application domains. Our approach is cur- 
rently limited to verifying safety properties. In the future, we plan to extend our 
approach so that it can handle a broader spectrum of property types, including 
liveness and fairness. IBS’s performance and scalability depend crucially on how 
the domain of relational objects is maintained and expanded. As future work, 
we would like to study the effectiveness of other heuristics to improve IBS’s 
scalability (e.g., random restart and expansion with domain-specific heuristics). 
We also aim to study how to learn/infer MFOTL properties during search to 
further improve the efficiency of our approach. 
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Abstract. We apply and evaluate polynomial-time algorithms to com- 
pute two different normal forms of propositional formulas arising in veri- 
fication. One of the normal form algorithms is presented for the first time. 
The algorithms compute normal forms and solve the word problem for 
two different subtheories of Boolean algebra: orthocomplemented bisemi- 
lattice (OCBSL) and ortholattice (OL). Equality of normal forms decides 
the word problem and is a sufficient (but not necessary) check for equiva- 
lence of propositional formulas. Our first contribution is a quadratic-time 
OL normal form algorithm, which induces a coarser equivalence than the 
OCBSL normal form and is thus a more precise approximation of propo- 
sitional equivalence. The algorithm is efficient even when the input for- 
mula is represented as a directed acyclic graph. Our second contribution 
is the evaluation of OCBSL and OL normal forms as part of a verification 
condition cache of the Stainless verifier for Scala. The results show that 
both normalization algorithms substantially increase the cache hit ratio 
and improve the ability to prove verification conditions by simplification 
alone. To gain further insights, we also compare the algorithms on hard- 
ware circuit benchmarks, showing that normalization reduces circuit size 
and works well in the presence of sharing. 


1 Introduction 


Algorithms and techniques to solve and reduce formulas in propositional logic 
(and its generalizations) are a major field of study. They have prime relevance in 
SAT and SMT solving algorithms [2,8,31], in optimization of logical circuit size 
in hardware [25], in interactive theorem proving where propositional variables 
can represent assumptions and conclusions of theorems [23,35,43], for decision 
procedures in automated theorem proving [13,26,37,41,42], and in every sub- 
field of formal verification in general [27]. The propositional problem of satis- 
fiability is NP-complete, whereas validity and equivalence are coNP-complete. 
While heuristic techniques give useful results in practice, in this paper we investi- 
gate guaranteed worst-case polynomial-time deterministic algorithms. Such algo- 
rithms can serve as building blocks of more complex functionality, without cre- 
ating an unpredictable dependency. 

Recently, researchers proposed the use of certain non-distributive comple- 
mented lattice-like structures to compute normal forms of formulas [20]. These 
results appear to have a practical potential, but they have not been experi- 
mentally evaluated. Moreover, the proposed completeness characterization is in 
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terms of “orthocomplemented bisemilattices” (OCBSL), which have a number 
of counterintuitive properties. For example, the structure is not a lattice and 
does not satisfy the absorption laws x A (x V y) = x and x V (aA y) = z. As 
a consequence, there is no natural semantic ordering on formulas corresponding 
to implication, with x ^ y =a and z V y = y inducing two different relations. 

Inspired by these limitations, we revisit results on lattices, which are much 
better behaving structures. We strengthen the OCBSL structure with the 
absorption law to consider the class of ortholattices, as summarized in Table 1. 
Ortholattices (OL) have a natural partial order for which A, V act as the great- 
est lower bound and the least upper bound. They also satisfy de Morgan’s law, 
allowing the elimination of one of the connectives in terms of the other two. On 
the other hand, ortholattices do not, in general, satisfy the distributivity law, 
which sets them apart from Boolean algebras. 

We present a new algorithm that computes a normal form for OL in quadratic 
time. The normal form is strictly stronger than the one for OCBSL: there are 
terms in the language {/A, V, =} that are distinct in OCBSL, but are equal in OL. 
Checking equality of OL normal forms thus more precisely approximates propo- 
sitional formula equivalence. Both normal forms can be thought of as strength- 
ening of the negation normal form. 


Table 1. Laws of algebraic structures with signature (S, A, V,0,1,—). Structures satis- 
fying laws L1-L8 and L1’-L8’ were called orthocomplemented bisemilattices (OCBSL) 
in [20]. Those OCBSL that additionally satisfy L9 and L9’ are ortholattices (OL). 


L1: eVy=yVa L1’: LTANYSYNAT 

L2: xV (y Vz)=(rVy)vz| LZ: tA (yAz)=(aAy)Az 
L3: TVT=T L3’: TÄNT =g 

L4: ty T= L4: xrtA\0=0 

L5: rV0=2 L5’: tA\l=2 

L6: AA Se L6’: same as L6 

LT: eVaz=1 L7: zA\7x=0 

L8: (eVy)=A7@ Ary | L8’: (a Ay) = 7" V >y 
L9: rV (r^y) =z L9’: z\(aVy)=2 


Example 1. Consider the formula x A (y V z). An OCBSL algorithm finds it 
equivalent to 

xr Anay Anz) Ax 
but it will consider these two formulas non-equivalent to 


xA(uVa)A(yV 2) 


The OL algorithm will identify the equivalence of all three formulas, thanks to 
the laws (L9, L9’). It will nonetheless consider them non-equivalent to 


(aA y)V (x Az) 


which a complete but exponential worst-case time algorithm for Boolean algebra 
equalities, such as one implemented in SAT solvers, will identify as equivalent. 
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A major practical question is the usefulness of such O(n log(n)?) (OCBSL) 
and O(n?) (OL) algorithms in verification. Are they as predictably efficient as 
the theoretical analysis suggests? What benefits do they provide as a component 
of verification tools? To answer these questions, we implement both OCBSL and 
OL algorithms on directed acyclic graph representations of formulas. We deploy 
the algorithms in tools that manipulate formulas, most notably verification con- 
ditions in a program verifier, as well as combinational Boolean circuits. 


Contributions. We make the following contributions: 


— We present the first algorithm computing a normal form of ortholattice (OL) 
terms. The algorithm preserves the quadratic time for the decision problem 
of equality in free ortholattices [7]. The quadratic time remains even when 
the formula is given in a shared (DAG) representation. 

— We implement and experimentally evaluate both the new algorithm for the 
OL normal form and a previously known (weaker) OCBSL algorithm (shown 
to run in quasilinear time). Our evaluation (Sect.6) includes: 

e behavior on randomly generated formulas; 
e scalability evaluation on normalizing circuits of size up to 10° gates; 
e normalization for simplification and caching of verification conditions 
when using the Stainless verifier, with both hard benchmarks (such as 
a compression algorithm) and collections of student submissions for pro- 
gramming assignments. 
We show that OCBSL and OL both have notable potential in practice. 


1.1 Related Work 


The overarching perspective behind our paper is understanding polynomial-time 
normalization of boolean algebra terms. Given (co)NP-hardness of problems 
related to Boolean algebras, we look at subtheories given by a subset of Boolean 
algebra axioms, including structures such as lattices. Lattices themselves have 
many uses in program abstraction, including abstract interpretation [11] and 
model checking [14,18]. The theory of the word problem for lattices has been 
studied already by Whitman [44], who proposed a quadratic solution for the 
word problem for free lattices. Lattices alone do not incorporate the notion of a 
complement (negation). Whitman’s algorithm has been adapted and extended 
to finitely presented lattices [17] and other variants, and then to free ortholat- 
tices by Bruns [7]. We extend this last result to not only decide equality, but 
also to compute a normal form for free ortholattices and to circuit (DAG) rep- 
resentation of terms. An efficient normal form does not follow from an efficient 
equivalence checking, as there are many formulas in the same equivalence class. 
Normal form is particularly useful in applications such as formula caching, which 
we evaluate in Sect.6. For a weaker theory of OCBSL, the normal form algo- 
rithm was introduced in [20], without any experimental evaluation. The theory 
of ortholattices, even if it adds only one more axiom, is notably stronger and 
better understood. The underlying lattice structure makes it possible to draw on 
the body of work on using lattices to abstract systems and enable algorithmic 
verification. The support for graphs (instead of only terms) as a representation 
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is of immense practical relevance, because expanding circuits into trees without 
the use of auxiliary variables creates structures of astronomical size (Sect. 6). 

A notable normal form that decides equality for propositional logic (thus also 
accounting for the distributivity law) are reduced ordered binary decision dia- 
grams (ROBDDs) [9]. ROBDDs are of great importance in verification, but can 
be exponential in the size of the initial formula. Circuit synthesis and verification 
tools such as ABC [6] use SAT solvers to optimize sub-circuits [45], which is an 
approach to choose a trade-off between the completeness and cost of exponential- 
time algorithm. Boolean algebras are in correspondence with boolean rings, 
which replace the least upper bound operation V with the symmetric differ- 
ence ® (defined as (pA 7q) V (~p ^ q) and satisfying x @ x = 0, corresponding to 
the exclusive or in the two-element case). There have been proposals to exploit 
the boolean ring structure in verification [12]. Polynomials over rings can also be 
used to obtain a normal form, but the polynomial canonical forms that we are 
aware of are exponential-sized. SMT solvers [2,34] extend SAT solvers, which 
makes them worst-case exponential (at best). We expect that our approach and 
algorithms could be used for preprocessing or representation, especially in non- 
clausal variants of SMT solvers [24,39]. In our evaluation, we apply formula 
normal forms to the problem of caching of verification conditions. Caching is 
often used in verification tools, including Dafny [28] and Stainless [22]. Our 
caching works on formulas and preserves the API of a constraint solver. It is 
thus fine grained and can be added to a program verifier or analyzer, regardless 
of whether it uses any other, domain-specific, forms of caching [29]. 


2 Preliminaries 


We present definitions and results necessary for the presentation of the ortho- 
lattice (OL) normal form algorithm. We assume familiarity with term rewriting 
and representation of terms as trees and directed acyclic graphs [15,20]. We use 
first-order logic with equality (whose symbol is =). We write A = F to mean 
that a first-order logic formula F is a consequence of (thus provable from) the 
set of formulas A. 


Definition 1 (Terms). Consider an algebraic signature S. We use Ts(X) to 
denote the set of terms over S with variables in X (typically an arbitrary count- 
ably infinite set, unless specified otherwise). Terms are constructed inductively 
as trees. Leaves are labeled with constant symbols or variables. Nodes are labeled 
with function symbols. If the label of a node is a commutative function, the chil- 
dren of the node are considered as a set (non-ordered) and otherwise as a list 
(ordered). We assume that commutative symbols are denoted as such in the sig- 
nature. 


Definition 2 (The Word Problem). Consider an algebraic signature S and 
a set of equational axioms E on S (for example the theory of lattices or ortholat- 
tices). The word problem for E is the problem of determining, given two terms 
ty and t2 € Ts(X), whether E F tı = to. 
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Definition 3 (Normal Form). Consider an algebraic signature S and a set of 
equational axioms E on S. A function f : Ts(X) + Ts(X) produces a normal 
form for E iff: Vt,,t2 € Tg(X), E E tı = te is equivalent to f(t,) = f(t). 


For Z an arbitrary non-empty set and (~) C Z x Z an equivalence relation on 
X we use a common notation: if x € Z then [zr] = {y € Z | x ~ y}. Let 
Zin = {lz]_ |z € Z}. 

We now briefly review key concepts of free algebras. Let S be a signature 
and E be an equational theory over this signature. Consider an equivalence 
relation on terms p ~g q <> (E E p = q), and note that Ts(X)/.,, is itself 
an E-algebra. A freely generated E-algebra, denoted Fp(X), is an algebra 
generated by variables in X and isomorphic to Ts(X)/.,, ie. in which only 
the laws of all E-algebra hold. There is always a homomorphism from a freely 
generated E-algebra to any other E-algebra over X. 

The set of terms Ts(X) is also called the term algebra over S. It is the alge- 
bra of all terms that contains no identity other than syntactic equality. Given a 
(possibly free) algebra A over S' and generated by X, there is a natural homomor- 
phism «4, in a sense an evaluation function, from Ts(X) to A. The word problem 
for a theory E then consists in, given p,q E€ Ts(X), deciding if E E p = q, that 
is, Kp,(t1) = Kp,(t2). 

In the sequel, we continue to use = to denote the equality symbol inside 
formulas as well as the usual identity of mathematical objects. We use == to 
specifically denote the computer-performed operation of structural equality on 
trees and sets, whereas === denotes reference equality of objects, meaning that 
a === b if and only if a and b denote the same object in memory. The distinction 
between == and === is relevant because == is a larger relation but may take 
linear or worse time to compute, whereas we assume === is constant time. 


~w 


Lattices. Lattices [4] are well-studied structures with signature (^, V) satisfying 
laws L1-L3, L9, L1’-L3’ and L9’ from Table 1. In particular, they do not have 
a complement operation, ~, in the signature. Lattices can also be viewed as a 
special kind of partially ordered sets with an order relation defined by (a < 
b) <— (a ^b = a), where the last condition is also equivalent to (a V b = b), 
given the axioms of lattices. When applied to two-element Boolean algebras, 
this order relation corresponds to logical implication in propositional logic. A 
bounded lattice is a lattice with maximal and minimal elements 1 and 0. The 
word problem for lattices has been solved by Whitman [44] through an algorithm 
to decide the < relation and is based on the following properties of free lattices: 


(1) s1 V... V Sm St <= > Visi <t 
(2)s<tiA..Atn S Vjs <t 
(3) S1 A.. A Sm SY <=> Iisi <y 
(A)a<sthV..Vtin <> Ijz < t 


s <t 4> (is; < t) V (Aj.s < tj), E 
with s = (s1 A... A Sm) and t= (ti V... V tn) 
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where x and y denote variables and s and t terms. The first four properties are 

direct consequences of the axioms of lattices. (w) above is Whitman property and 
holds in free lattices (not in all lattices). Applying the above rules recursively 
decides the < relation. 


Orthocomplemented Bisemilattices (OCBSL). OCBSL [20] are also a 
weakening of Boolean algebras (and, in fact, a subtheory of ortholattices). They 
satisfy laws L1-L8, L1’—L8’ but not the absorption law (L9, L9’). This implies 
in particular that OCBSL do not have a canonical order relation as lattices do, 
but rather have two, in general distinct, relations: 


axb=—aAb=a 
alb = aVb=b 


If we add absorption axioms, a A b = a implies a V b = (a ^b) Vb = b (and 
dually), so the structure becomes a lattice. The algorithm presented in [20] does 
not rely on lattice properties. Instead, it is proven that the axioms of OCBSL 
can be extended to a term rewriting system which is confluent and terminating, 
and hence admits a normal form. Using variants of algorithms on labelled trees 
to handle commutativity, this normal form can be computed in quasilinear time 
O(nlog?(n)). In contrast, in the case of free lattices, there exists no confluent 
and terminating term rewriting system [16]. 


3 Deriving an Ortholattice Normal Form Algorithm 


Ortholattices [3, Chapter II.1] are structures satisfying laws L1-L9, L1’-L9’ of 
Table 1. An ortholattice (OL) need not be a Boolean algebra, nor an orthomod- 
ular lattice; the smallest example of such OL is “Benzene” (O6), with elements 
{0, a, b, =b, ~a, 1} where a < b [5]. The word problem for free ortholattices, which 
checks if a given equation is true, has been shown to be solvable in quadratic 
time by Bruns [7]. In this section, we go further by presenting an efficient com- 
putation of normal forms, which reduces the word problem to syntactic equality. 
In addition, normal forms can be efficiently used for formula simplification and 
caching, unlike equality procedure itself. 


Definition 4. For a set of variables X, we define a disjoint set of the same 
cardinality X’ with a bijective function (V : X +> X’. Denote by L the theory of 
bounded lattices and OL the theory of ortholattices. Define Fr, Foz to be their 
free lattices and Tr, and To, to be the sets of terms over their respective signature. 
Define <r as the relation on Tr such thats <r t = > kKp,(s) < &p,(t) and 
<oL analogously by s SoL t = > KF, (8) < KF,(t), where k denotes natural 
homomorphisms as introduced in the previous section. 


Note: p Sor q <> (Fort = (p ^q = q)) where Eo, is the set of axioms of 
Table 1. 
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3.1 Deciding <oz by Reduction to Bounded Lattices 


We consider Ty (X U X’) as a subset of Toz(X) via the injective inclusion on 
variables mapping x > a and wa’ +> 72. We also define a function ô : ToL(X) > 
T,(X U X’) as transformation into negation normal form, using laws L6 (double 
negation elimination), L8 and L8’ (de Morgan’s laws). 

We define a set R C Ty (X UX’) of terms reduced with respect to the contra- 
diction laws (L7 and L7’). These imply that, e.g., given a term aVb, if ab < (avb), 
then from as b < a V b, we have 1 = b V 7b < (a V b). The following inductive 
definition induces an algorithm to check x € R, meaning that such reductions 
do not apply inside z: 


0,1,z,2’€ R (for x € X) 
aVbER = a€ R,bER, 6(7-a) raV b, 6(-b) fr avb 
anbE R = a€ R,bER, 6(7a) Zr ab, 6(-b) žr a^b 


Above, <z is the order relation on lattices, x >z y denotes y <z x, and £z, 
Z1 are the negations of those conditions: x £z y iff not x <z y, whereas x Fr, y 
iff not y <z x. 

We also define 8 : T,(X U X’) > R by: 


B(0) = 0, (1) = 1, B(2) = z, B(a") = 2" (for z € X) 
Bla) V B(b) if Bla) V Bb) ER 


ney 1 otherwise 
saaga PAPO TAGANE R 
0 otherwise 


Example 2. We have 3((aA-y) V(7aVy)) = 1 because 6(>(aA-y)) = =z Vy 
and azVy <r (~©Ar7y)VaaV y. 

Note that it is generally not sufficient to check only for (~a) £z b for 
larger examples. In particular, if 6(-a) is itself a conjunction, by Whitman’s 
property, the condition 6(7a) £ (a V b) is not in general equivalent to having 
either 6(7a) Éz b or 6(7a) £z a. 


We next reformulate the theorem from Bruns [7]. A key construction from 
the proof is the following Lemma. 


Lemma 1. R/., is an ortholattice isomorphic to Fo,(X). 
Theorem 1. Let s,t € ToL(X). Then, s <o t = > 6(d(s)) <z B(O(t)). 


Proof. We sketch and adapt the original proof. Intuitively, computing 6(4(s)) <r 
3(6(t)) should be sufficient to compute the <o relation: 6 reduces terms to 
normal forms modulo rules L6 (double negation elimination) and L8, L8’ (De 
Morgan’s Law), and then 8 takes care of rule L7 (contradiction). The only 
rules left are rules from (bounded) lattices, which should be dealt with by 
<z. From Lemma 1, the fact that @ factors in the evaluation function KF, 
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(i.e. is equivalence preserving) and properties of free algebras, it can be shown 
that Kr, = yo Ny, 0 B08, where NV, (x) = [z], and y: Ry, > For(X) is 
an isomorphism. Hence 


~yz? 


KFol$) S KFolt) => BC) < POH) ae 


which is equivalent to s Soz t = ((6(s)) <z B(d(t)). 


3.2 Reduction to Normal Form 


To obtain a normal form for Toz(X), we will compose ô and 8 with a normal 
form function for T,(X U X’). A disjunction a = a, V ... V am (and dually for a 
conjunction) is in normal form for <z if and only if the following two properties 
hold [15, p. 17]: 


1. if a; = (ai A... A Gin), then for all j, aij £ a 


2. (a1,...,@n) forms an antichain (if i Æ j then a; £ aj) 


We now show how to reduce a term in R so that it satisfies both properties 
using function ¢ that enforces property 1, and then 7 that additionally enforces 
property 2. The functions operate dually on A and V; we specify them only on 
V cases for brevity. 


Enforcing Property 1. Define ¢ : R — R recursively such that: 
C(ai V... V aij V -~ V Om) if a; = (ait A... A Gin) 


(ar V... V Gm) = and aij <p a1 V V Om 
Clar) V... V Clam) otherwise 


(dually for A). It follows that s ~z ¢(s) for every term s because aij <z a1 V 
.»V Gm implies a1 V ... V Gm = a1 V...V am V Qij and a; V aij = aij by absorption. 


Enforcing Property 2 (Antichain). Define 7: R — R such that 


ar V. V Qi—1 V Qi41 V.. Va if a;i <L aj, i £ j 
Hos Vo tm) = 4 1 i—1 V Qi41 md iSLaj ij 


nlar) V... V (am) otherwise 


We have s ~z (s) for every term s because a; <z a; means a; V aj = a). 


Example 3. We have: n(¢( [(aV b) A lavo] Vb)) = n((a Vb) Vb) = aV ob. 
Indeed, the first equality follows from 


(aV b) <z [(aV b) A(aVeo)]Vb 


and the second from b <z (a V b). 
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Denote by R’ the subset of R containing the terms satisfying property 1 and 
R” the subset of R’ of terms satisfying property 2. It is easy to see that Ç is 
actually R — R’ and 7 can be restricted to R’ — R”. Moreover s,t € R” and 
s ~z t implies s = t. Recall that Vw € Toz(X).6(d(w)) € R. Since 8 and 6 are 
equivalence preserving, Vwi, w2 E€ Tor(X) 


wı voL wz => B(d(w1)) ~or B(d(we)) 


Moreover, since (by Lemma 1) R/., is an ortholattice, we have 


B(6(w1)) ~or B(6(we)) <=> B(O(wr)) ~r B(d(we)) 


i.e. in R, ™~OL=™~L- Then, 


B(d(w1)) ~r B(d(w2)) = n(¢(B(6(w1))) ~ n(C(B(d(wa)))) 
and since both 7(¢(G(6(w1))) E€ R” and 7(¢(G(d(we))) € R” 


m(6(8(8(w1))) = m((B(d(wa)))) 
We finally conclude: 


Theorem 2. NFo, = 7°¢€0f006 is a computable normal form function for 
ortholattices. 


3.3 Complexity and Normal Form Size 


Before presenting the algorithm in more detail, we argue why the normal form 
function from the previous section can be computed efficiently. We assume a 
RAM model and hence that creating new nodes in the tree representation of 
terms can be done in constant time. 

Note that the size of the output of each of ô, 3, ¢ and 7 is linearly bounded 
by the size of the input. Thus, the asymptotic runtime complexity of the com- 
position is the sum of the runtimes of these functions. Recall that 6 (negation 
normal form) is computable in linear time and ¢ and 7 are both computable 
in worst-case quadratic time, plus the time needed to compute <z. Then, 2, 
R and <z are each computable in constant time plus the time needed for the 
mutually recursive calls. While a direct recursive implementation would be expo- 
nential, observe that the computation time of R and 8 is proportional to the 
total number of times they get called on. If we store (memoize) the results of the 
functions for each different input, this time can be bounded by the total num- 
ber of different sub-nodes that are part of the input or which we create during 
the algorithm’s execution. Similarly, <z needs to be applied to, at worst, every 
pair of such sub-nodes. Consequently, if we memoize the result of each of these 
functions at all their calls, we may expect to obtain at most quadratic time to 
compute them on all the sub-nodes of a formula. 

The above argument is, however, not entirely sufficient, because comput- 
ing R(a A b) requires creating the new nodes sa and ~b and then computing 
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their negation normal form, which again creates new nodes. Indeed, note that, 
for memoization, we need to rely on reference (pointer) equality, as structural 
equality would take a linear amount of time to compute (for a total cubic time). 
Hence, to obtain quadratic time and space, we need to be able to negate a node 
in negation normal form without creating new nodes too many new nodes in 
memory. To do so, define op : TL(X U X’) > T,(X U X’) by 


op(x) = 
) 


” op(a ^b) = op(a) V op(b) 
op(a’ A 


op(a V b) = op(a) ^ op() 


op(a) is functionally equal to ô(~a), but has the crucial property that 


children(op(T)) === opfchildren(r)] 


Where 7 denotes a formal conjunction or disjunction and children(r) is the set 
of children of 7 as a tree. op can be efficiently memoized. Moreover, it can be 
bijectively memoized: if op(a) = b we shall also store op(b) = a. We thus obtain 
op(children(op(7))) === children(r). In this approach we are guaranteed to 
never instantiate any node beyond the n subnodes of the original formula (in 
negation normal form) and their opposite for a total of 2n nodes. Hence, we only 
ever needed to call op, R and 6 on up to 2n different inputs and < on up to 4n? 
different inputs, guaranteeing a final quadratic running time. 


Minimal Size. Finally, as none of 6, 3, ¢ and 7 ever increase the size of the for- 
mula (in terms of the number of literals, conjunctions and disjunctions), neither 
does NF oz. Consequently, for any term w, NF oz (w) is one of the smallest terms 
equivalent to w. Indeed, let wmin = w such that wmin is a term of smallest size 
in the equivalence class of w. In particular, NF oz (Wmin) cannot be smaller than 
Wmin (because Wmin is minimal in the class) nor larger (because NF oz is size 
non-increasing). Since NFoz(w) = NFoz(wmin), NFoz(w) is of minimal size. 


Theorem 3. The normal form from Theorem 2 can be computed by an algo- 
rithm running in time and space O(n?). Moreover, the resulting normal form is 
guaranteed to be smallest in the equivalence class of the input term. 


4 Algorithm with Memoization and Structure Sharing 


To obtain a practical realization of Theorem 3, we need to address two main 
challenges. First, as explained in the previous section, we need to memoize the 
result of some functions to avoid exponential blowup. Second, we want to make 
the procedure compatible with structure sharing, since it is an important feature 
for many applications. 

By memoization we mean modifying a function so that it saves the result of 
the calls for each argument, so that they can be found without future recompu- 
tations. Results of function calls can be stored in a map. For single-argument 
functions we find it is typically more efficient to introduce a field in each object 
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to hold the result of calling a function on it. Under structure sharing we under- 
stand the possibility to reuse subformulas multiple times in the description of a 
logical expression. In case of signature ^, V, ~, such expressions can be viewed as 
combinational Boolean circuits. We represent such terms using directed acyclic 
graph (DAG) reference structures instead of tree structures. 

Circuits can be exponentially more succinct than equivalent formulas, but not 
all formula rewrites are efficient in the presence of structure sharing (consider 
for example, rules with substitution such as x A F ~ g A Fla := 1], where F 
may also be referred to somewhere else). Structure sharing is thus non-trivial to 
maintain throughout all representations and transformations. Indeed, making a 
naive recursive modification of a circuit will unfold the DAG into a tree, often 
causing an exponential increase in space. Doing so optimally also requires the 
use of memoization. Moreover, the choice of representations and datastructures 
is critical. 

We show that it is possible to make both algorithms fully compatible with 
structure sharing without ever creating node duplicates. The algorithm ensures 
that the resulting circuits will contain a smaller number of subnodes, preserve 
equivalence, and enforce that two circuits have the same representation if and 
only if they describe the same term (by the laws of OL). 


Algorithm 1: Datastructure for Formulas 


1 numberOfFormulas — 0 

2 Datastructure AlGFormula 

3 val uniqueld: Int  numberOfFormulas++ // get fresh ID on node creation 
4 var inverse:AIGFormula — null 

5 var normal:AIGFormula — null 

6 var smaller: Set[Int] — Ø // sparse bitset 
7 var notSmaller: Set[Int] — 0 // sparse bitset 


s case Variable(id:String, polarity:Bool) of AIGFormula 

ə case Literal(polarity:Bool) of AIGFormula 

10 case Conjunction(children:List[AIGFormula], polarity:Bool) of AlGFormula 
11 val Positive: Bool = True; val Negative: Bool = False 


Algorithm 2: Computing Negations 
def inverse(T) // AIGFormula -> AIGFormula 
if isDefined(7.inverse) then 


| return 7T.inverse 
else 


T.inverse — T 


1 
2 
3 
4 
5 7 — T.copy(polarity = !7.polarity) 
6 
7 T.inverse — T 

8 


return 7 
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Algorithm 3: Computing < 


def < (T, T) // AIGFormula -> AIGFormula -> Bool 
if 7.smaller contains m.uniqueld then return True 

else if 7.notSmaller contains 7.uniqueld then return False 

else 

r match (7, 7): 

case (lhs, Conjunction(children, Positive)) : 

Ve € children. T<c 

case (Conjunction(children, Negative), rhs) : 

Ve € children. inverse(c)<a 

case (Variable(id), Conjunction(children, Negative) : 

de € children. r<inverse(c) 

case (Conjunction(children, Positive), Variable(id)) : 

de € children. c<7 

case (Conjunction(tauCh, Positive), Conjunction(piCh, Negative)) : 
// would cause exponential explosion without memoization: 


onan ontr WON FB 


H H H B H 
e Ù NBO 


15 (ac € tauCh. c<r) V (Ac € piCh. r<inverse(c)) 
16 case (Variable(id1), Variable(id2)) : 

17 idl == id2 

18 if r then 7.smaller += 7.uniqueld 

19 else 7.notSmaller += m.uniqueld 

20 return r 


Pseudocode. Algorithms 1, 2, 3, 4 present pseudocode implementation of the 
normal form function from Theorem 2. To more easily maintain structure shar- 
ing and gain performance, we move away from the negation normal form rep- 
resentation and prefer to use a representation of formulas similar to AIG (And- 
Inverter Graph) where a formula is either a Conjunction, a Variable or a Literal 
and contains a boolean value telling if the formula is positive or negative (see 
Algorithm 1). This implies that 6 needs to transform arbitrary Boolean formulas 
into AIGFormulas instead of negation normal forms. Fortunately, AIGFormula 
can be efficiently translated to NNF (and back) so we can view them as an 
alternative representation of terms in 7z(X U X’). For the sake of space, we do 
not show the reduction from general formula trees on the signature (A, V, =) and 
work directly with AIGFormulas, but the implementation needs memoization to 
avoid exponential duplication in presence of structure sharing. 

Recall that computing R requires taking the negation of some formulas, and 
projecting them back into 7z(X U X’) with 6. Using AIGFormula makes it 
possible to always take the negation of a formula in constant time and space. 
The corresponding function inverse(rT) is in Algorithm 2, and corresponds to 
the op function from the previous section. The memoization ensures that for 
all 7, inverse(inverse(r)) === 7, and our choice of data structure ensures that 
children(inverse(T)) === children(r). Those two properties guarantee that any 
sequence of access to children and inverses of 7 will always yield a formula object 
within the original DAG, or its single inverse copy. In particular, regardless of 
structure sharing in the input structure, we never need to store in memory more 
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than twice the total number of formula nodes in the input. As explained in 
Sect. 3.3, a similar condition could be made to hold with NNF, but we believe it 
is more complicated and less efficient when implemented. 

Function < in Algorithm 3 is based on Whitman’s algorithm adapted to 
AIGFormula. For memoization, because the function takes two arguments, we 
store in each node the set of nodes it is smaller than or not using two sets. Note 
that storing and accessing values in a set (even a hash set) is only as efficient as 
computing the equality relation on two objects is. Because structural equality 
== takes linear time to compute, we use referential equality with the uniqueld 
of each formula (declared in Algorithm 1). We found that using sparse bit sets 
yields the best performances. 

The simplify function in Algorithm 4 makes a one-level simplification of a 
conjunction node, assuming that its children have already been simplified. We 
present the case when 7 is positive. It works in three steps. The subfunction zeta 
corresponds to the ¢ function from the previous section. It both flattens consecu- 
tive positive conjunctions and applies a transformation based on a strengthened 
version of the absorption law. Then at line 13, we filter out the nodes which are 
smaller than some other node, for example if c < b then aA b A c becomes a A^ c. 
This corresponds to function 7. Finally, line 16 applies the contradiction law, i.e. 
ifa^b^c < ~a then aA bAc becomes 0. Note again that checking only if either 
b < ~a or c < ~a holds is not sufficient (see for example the case a = (=b V ~c). 
This corresponds to the 8 function. The correspondence with the three functions 
Ç, 7 and @ is not exact; all computations are done in a single traversal over the 
structure of the formula, rather than in separate passes as the composition o of 
functions in Theorem 2 might suggest. 


Importance of Structure Sharing. As detailed in Sect. 6, our implementation 
finished in a few tenths of a second on circuits containing approximately 10° And 
gates, but whose expanded formula would have size over 102000, demonstrating 
the compatibility of the algorithm with structure sharing. For this, we must 
ensure at every phase and for every intermediate representation, from parsing of 
the input to exporting the solution, that no duplicate node is ever created. This is 
achieved, again, using memoization. The complete and testable implementation 
of both the OL and OCBSL algorithms in Scala is available at https://github. 
com/epfl-lara/lattices-algorithms. 


5 Application to More Expressive Logics 


This section outlines how we use OCBSL and OL algorithms in program verifica- 
tion. Boolean Algebra is not only relevant for pure propositional logic; it is also 
the coreof more complex logics, such as the ones used for verification of software. 
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Algorithm 4: Computing normal form 


1 def simplify(rT) // Conjunction -> AIGFormula 
// Assume T is positive 
// (In negative cases, some nodes must be inverted and < reversed.) 

2 newChildren <— List() 

3 def zeta(child) 

4 match child : 

5 case PositiveConjunction : 

6 newChildren.add(child.Children) 

7 case child:NegativeConjunction : 

8 gc + child.children.find(gc + T< gc) 

9 if isDefined(gc) then zeta(gc) 

10 else newChildren.add(child) 

11 for child — 7.children do 

12 zeta(child) 

13 children’ — // filter out redundant children smaller than another child 

14 if children’.size == 0 then return Literal(True) 

15 else if children’.size == 1 then return children’.head 

16 else if 3 c € children’. T< inverse(c) then return Literal(False) 

17 else return Conjunction(newChildren) 

18 

19 def NFor(T) // AIGFormula -> AIGFormula 

20 if isDefined(T.normal) then return 7.normal 

21 else 

22 7.normal — match 7 : 

23 case Variable(id, True): 7 

24 case Variable(id, False): inverse( NFoz (inverse(T))) 

25 case Conjunction(children, polarity): simplify(children map NFor, 

polarity ) 
26 return 7T.normal 


Propositional terms appear as subexpressions of the program (as members of the 
Boolean type), but also in verification conditions corresponding to correctness 
properties. This section highlights key aspects of such a deployment. 

We consider programs containing let bindings, pattern matching, algebraic 
data types, and theories including numbers and arrays. Let bindings typically 
arise when a variable is set in a program, but is also introduced in program 
transformations to prevent exponential increase in the size of program trees. 
Since OCBSL and OL are compatible with a DAG representation—fulfilling a 
similar role to let bindings—they can similarly “see through” bindings without 
breaking them or duplicating subexpressions. 

If-then-else and pattern matching conditions can be analyzed and used by the 
algorithms, possibly leading to dead-branch removal or condition simplification. 
Extending OCBSL and OL to reason about ADT sorts further increases the 
simplification potential for pattern matching. For instance, given assumptions 
ġ, ascrutinee s and an ADT constructor identifier id of sort S, we are interested 
in determining whether s is an instance of the constructor id. A trivial case 
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includes checking the form of s. Otherwise, we can run OCBSL or OL to check 
whether ¢ => (s is id) holds. If 6 => (s is id) fails, we instead test 
whether 6 = > -(s is id’) for all id’ 4 id € S. We may also negatively 
answer to the query if 6 => (s is id’) for some id’ Æ id € S. 

The original OCBSL algorithm presented in [20] achieves quasi-linear time 
complexity by assigning codes to subnodes such that equivalent nodes (by the 
laws of OCBSL) have the same codes. This is not required for the OL algorithm 
as it is quadratic anyway, but can still be done to allow common subexpres- 
sion elimination. This is similar to hash-consing, but more powerful, as it also 
eliminates expressions which are equivalent with respect to OCBSL or OL. 

Of particular relevance is the inclusion of underlying theories such as numbers 
or arrays. OL has an advantage over OCBSL in terms of extensibility. Namely, 
OL makes it possible to implement more properties of theories through expan- 
sion of its <oz relation (Algorithm 3) with inequalities between syntactically 
distinct atomic formulas. For example, if <z and <; are relations on mathe- 
matical integers in the theory of the SMT solver, our implementation deduces 
that (a <r y) Sor (x <r y) using the rule z+a<;0 = 2z+0 <r 0 
when b <; a+ 1, instantiated with z = x — y and a = b = 0. In one of 
our benchmarks, this simple rule led OL to simplify a verification condition 
(VC) of the form a(x <r yA ¢1 Ax >r yA 2) to true, which was of interest 
because ¢1,2 were large. This simplification is performed at line 16 of Algo- 
rithm 4 with T = « <y yA x >; yA , where we have c = x > ; y because 
T SoL (x <r y) <= (z <r y) Sor (x <r y). In contrast, OCBSL was not able 
to do the simplification because it is not able to systematically check for inequal- 
ities of subterms. For arrays, our implementation also checks for the property 
i Æ j SoL aft := v](j) = a(j). Combined with two other rules, related to con- 
gruence, OL performs particularly well for array-intensive benchmarks such as 
SortedArray. Note that in OCBSL we may encode a weak form of implication 
by specifying (giving the same code to) 6A Y = ¢ or dV = y, but unlike the 
OL encoding, this does not even allow simplifying formulas such as 6 AT A nw 
without a specific check, which would require quadratic time in general. 


Other Extensions. Beyond program verification, we suspect OL or OCBSL 
based techniques to be extendable in applications such as type checkers, inter- 
active and automated theorem provers using first order, higher order, temporal 
and modal logics, SMT solvers or lattice problems in abstract interpretation. 
Unidirectional rules which may be particularly relevant for automated theorem 
proving include [f(z) = f(y)] <oz [x = yl, Wz, P(x)] <or P(t), and P <oz Q 
when P — Q is a known theorem. In the context of quantified logics and lambda 
calculus, both algorithms are compatible with de Bruijn index representation of 
bound variables. Both algorithms can be used as partial simplification before or 
while applying more powerful but possibly incomplete heuristic simplification 
methods, such has the simplification rule x A Fla] ~ x A F [x := 1] (which, if 
viewed as an equality axiom, turns OL into Boolean algebra). 
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6 Evaluation 


Our experimental evaluation comprises three parts. First, we analyze the behav- 
ior of the OL and OCBSL algorithms on large random formulas, to understand 
the feasibility of using them for normalization. Second, we evaluate the algo- 
rithms on combinatorial circuits [1]. Third and most importantly, we show their 
impact through a new simplifier for verification conditions of the Stainless [22] 
verifier. The goal of the simplifier is to avoid the need to invoke a solver for some 
of the formulas by reducing them to True, as well as to normalize them before 
storing them in a persistent cache file. The cache avoids the need to repeatedly 
prove previously proven verification conditions. By improving normalization, we 
improve the cache hit rate. We conduct all experiments on a server with 2x 
Intel® Xeon®CPU E5-2680 v2 at 2.80 GHz, 40 cores including hyperthreading 
and 64 GB of memory. 


6.1 Randomly Generated Propositional Formulas 


We first evaluate the two algorithms on randomly generated formulas. We mea- 
sure the running time and the reduction in formula size. We build the random 
formulas as follows. 


Definition 5. A random formula is parameterized by a size s and a set of avail- 
able variables X = {a1,...,@n}. Given a size s, if s < 1 then pick uniformly at 
random a variable from X or its negation and return it. Otherwise, pick t such 
that 0 < t< s—1 and generate two formulas ¢, and ¢2 of sizes t and s—1-t. 
Return uniformly at random And(¢1, $2) or Or(¢1, 62). 


Running Time. We show in Fig.1la the approximate running time of both 
algorithms for various sizes of formulas. We ran the experiment 21 times for each 
formula size category and took the median. For comparison with a theoretically 
linear time process, we also give the running time of the corresponding negation 
normal form transformation. These implementations do not come with low-level 
optimizations and are intended for demonstrating usability in practice, and do 
not serve as a competitive indicator. 


tot] E NNF 10 OCBSL OL 
OCBSL a 
103] =H OL 507 
E 0.6 
v u 0.5 
E 10? 504 
0.3 
10? 0.2 

ae 0.1 Bee 
1000 104 105 1000 10 105 10° 
Formula size Formula size 
(a) Running time (b) Size ratio, showing reduction 


Fig. 1. (a) Median running time of NNF and the two algorithms (log-log scale). (b) 
Median size of the normalized formulas relative to the original in NNF. |X| = 50 
variables. 
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Size Reduction. For a fairer comparison, we apply a basic simplification (flat- 
tening and transformation into negation normal form) to random formulas before 
computing their size. We compare the number of connectors before and after the 
simplification for both algorithms. We show the relative improvements of the OL 
and OCBSL algorithms compared to the original formulas for various sizes of 
formulas and 50 variables. We have run both algorithms 21 times and report the 
median results in Figs. 1b. 

It is interesting to note that the OL normal form is consistently and signif- 
icantly smaller than the OCBSL normal form, i.e. the Absorption law actually 
allows non-trivial reductions in size. This confirms that, in general, there is a 
trade-off between the two algorithms between speed and simplification strength. 


6.2 Computing Normal Forms for Hardware Circuits 


Moving towards more realistic formulas, we assess the scalability of OCBSL and 
OL on the EPFL Combinatorial Benchmark [1] comprising 10 arithmetic circuits 
designed to challenge optimization tools, with up to 108 gates. 


Table 2. Results on the EPFL Combinatorial Benchmark. OL times-out for hyp after 
1h. 


adder | bar div |hyp |log2 |max | mult | sin sqrt | square 
# of gates 50173 | 72704| 107 |108 |107 |107 107 |10° |107 | 107 
OCBSL Ratio 1.00 | 0.703 | 0.777 | 0.961 | 0.700 | 0.861 | 0.867 | 0.652 | 0.661 | 0.927 
OL Ratio 1.00 | 0.703 | 0.777 | — 0.697 | 0.861 | 0.865 | 0.647 | 0.661 | 0.927 
OCBSL Time [s] | 0.142 | 0.182 | 0.866 | 2.06 | 0.564 | 0.189 | 0.442 0.255 | 0.362 | 0.365 
OL Time [s] 0.276 | 0.338 |706 |- 339 |0.319| 73.8 15.7 |256 | 36.0 


We run the experiment five times. We report the median running time and 
the relative size after optimization in Table 2. We observe that the OCBSL algo- 
rithm is close to as good as the OL algorithm in all cases, and, moreover, that it 
is very time-efficient even for problems with hundreds of millions of gates. The 
OL algorithm sometimes performs slightly better and is pretty much as time- 
efficient for not too large inputs, but becomes significantly more time-consuming 
for inputs with more than approximately 10° gates. Those results suggest on one 
hand that OCBSL may be a more suitable reduction technique on some appli- 
cations with very large formulas, depending on their internal structures. It also 
suggests that both algorithms work well in practice with Boolean circuits mak- 
ing heavy use of structure sharing. Indeed, the expanded form of, for example, 
the adder circuit would have about 2?°°° nodes. 


6.3 Caching Verification Conditions in Stainless 


We implement the approach described in Sect. 5 by modifying the Stainless veri- 
fier [22,40]! , a publicly available tool for building formally verified Scala programs. 


1 https: //github.com/epfl-lara/stainless/. 
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Our implementation adds two new simplifiers to Stainless: OCBSL-backed 
and OL-backed. They are part of Stainless release v0.9.8? and are selectable by 
the command line options --simplifier=ocbsl and --simplifier=ol respec- 
tively. For the OL simplifier, we have extended the <oz relation with 12 simple 
arithmetic and array rules. 

We experimentally compare the two new simplifiers to the existing one (which 
we denote Old). We use two groups of benchmarks: (1) six Stainless case studies 
from the Bolts repository? that take a significant amount of time to verify, 
and (2) nine benchmark sets from automated grading of student assignments. 
Together, this constitutes around 84’000 lines of Scala code, specifications, and 
auxiliary assertions. We report the following metrics: the size of the VCs after 
simplification, the number of cache hits, the number of VCs simplified to 1, the 
wall-clock time and the cumulative solving time. The wall-clock time comprises 
the full Stainless pipeline, from parsing the program to outputting the result, 
passing by solver calls and VC simplification. 
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Fig. 2. VCs (tree) size scatter plot from all benchmarks for Old, OCBSL and OL. 


Evaluation on Bolts Case Studies. We consider the following case studies 
from the mentioned Bolts repository: 


— LongMap (9613 VCs, 7091 LOC), a mutable hash map, 64-bit integer keys, 
open addressing, formalized by Samuel Chassot (EPFL) and proven to behave 
equivalently to a list of (key, value) pairs. 

— A type checker for System F [19] (5040 VCs, 2501 LOC) formalized in Stain- 
less by Andrea Gilot and Noé De Santo (EPFL). Among the key properties 
proven are type judgment uniqueness, preservation and progress. 

— QOI (4487 VCs, 2812 LOC), an implementation of the Quite OK Image for- 
mat. Decoding an encoded image is shown to yield the original image [10]. 

— RedBlack, a red-black tree (764 VCs, 796 LOC). 

— SortedArray (472 VCs, 429 LOC), a mutable array preserving order on inser- 
tion. Developed for use in a simplified model of part of a file system [21]. 


? https: //github.com /epfl-lara/stainless/releases/tag/v0.9.8. 
3 https: //github.com/epfl-lara/bolts. 
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— ConcRope (408 VCs, 621 LOC), a Conc-Tree rope [36], supporting amortized 


constant time append and prepend operation, based on a Leon formalization 
[30]. 


We report the VCs size measurement in Fig. 2, where we aggregate the results 
from all benchmarks. Figure 2a reveals a couple of VCs with an increased size. 
Inspection of these VCs shows the reason is due to the new simplifiers always 
inlining “simple expressions”, such as field projection on free variables, instead 
of having them bound. On average, OCBSL and OL decrease the size of the VCs 
by 37% compared to Old. OL reduces the size of the VCs slightly compared to 
OCBSL (Fig. 2b). 


a 
8 
g 
230 g 8 
550 = Od 3 6 ee a = Old 
Sio OCBSL = 4 m Ø z fi OCBSL 
o ei — SE Ezi pa ROL: og E p Es Ee oL 
LongMap SystemF QO! RedBlack SortedArrayConcRope LongMap SystemF QOI RedBlack SortedArrayConcRope 
(a) Cache hits in a single run (b) VCs simplified to 1 
kad 
s] 
2 229 
p=] see 
fed z5 2S9 
Neate xo 
v 12S RA EVIRA EVAS pee 
£ ee Se EZR EYR | EEE Old 
= ese cece 
E Roe <i ZZA OCBSL 
R SY 
RS 2A OL 
RA - 
o POS TS MX 
£ RS Gc Bo 
z so! Gs ee ee 
z Meee se BSH x 
P BS ia be a 
£ A RI RSS pe | EE] Old 
E eect Sx by x) 
E Pose] bS | EAI OCBSL 
eect BO { R 
se Rx Rx) | EZA OL 
m Qol RedBlack SortedArrayConcRope 


(15m5s) (15m5s) (93m2s) (2m41s) (4m13s) (1m55s) 


(d) Wall-clock time 


Fig. 3. Old, OCBSL and OL results for cache hits, VCs reduced to 1, solving and run- 
ning time. (c), (d) are normalized with respect to Old. In (c), the gray boxes represent 
the time spared due to extra cache hits and VCs reduced to 1 compared to Old. 


In Fig. 3a, we report the cache hit ratio. For the new simplifiers, reducing the 
formula size has the desired effect of noticeably increasing the hit ratio, especially 
for 4 out of 6 benchmarks. The additional power of OL helps for System F and 
SortedArray. 

We report in Fig. 3c not only the solving time for the two simplifiers (normal- 
ized with respect to Old), but also the solving time saved thanks to additional 
cache hits and VCs simplified to 1. ConcRope and RedBlack do not benefit 
from the new simplifiers, while the other benchmarks do in various degrees. For 
LongMap, adding the two ratios yields a ratio of ~ 1, implying the reduced solving 
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time is due to extra caching. The solver did not benefit from the new simplifiers 
for non-cached VCs. The System F benchmark shows a ratio exceeding 1, mean- 
ing that OCBSL and OL did not help the solver more than the extra time they 
took to run. For QOI and SortedArray, the combined ratio is less than 1: the 
new simplifiers helped the solver for non-cached VCs. OL performs significantly 
better than OCBSL in the SortedArray benchmark, thanks to the extension of 
the <oz relation with array rules. We note that 25% of QOI VCs have a size of 
more than 880, against 480 for the second benchmark (SortedArray), and 450 
for the third (LongMap). 

Turning our attention to Fig. 3d, we note that the time spared to solver calls 
is essentially compensated for more work on the new simplifiers on three of the 
benchmarks. Moreover, LongMap, SortedArray and especially QOI have a net 
benefit over Old. 

OCBSL and OL simplifiers show the greatest improvement on large VCs. 
Note that the outcome of a Stainless run highly depends on user-provided asser- 
tions, which were hand-tuned under the Old simplifier. It is thus possible that 
new simplifiers have a disadvantage because they were not used during the ver- 
ification process. The additional power provided by the new simplifiers may 
make writing such intermediate assertions easier and faster, so we expect the 
full advantage of new simplifiers in newly developed verified software. 


Table 3. Results on programming assignments 


Benchmark filter | max | mirror | mem sigma | nat uniq | formula | lambda 
# Submissions 210 |216 96 136 |734 381 147 677 782 
Cumulative LOC 2367 |3452 1165 |1987 |8347 |8950 |3648 9226 |17958 
# VCs 820 |844 | 387 560 |1528 |2653 |1352 |9865 5922 
Solver Calls Old 28 81 44 77 75 133 264 037 1115 
OCBSL | 19 79 143 75 58 133 251 033 1069 
OL 18 79 42 74 50 131 251 1032 1066 
# VCs reduced to 1 | Old 211 |302 |95 151 |4 886 381 322 1320 
OCBSL |211 |302 |95 151 |6 890 381 327 1322 
OL 213 |302 |95 151 |794 890 381 1332 1322 
Cache Hits Old 581 |461 | 248 332 |1449 |1634 |707 7506 3487 
OCBSL |590 |463 |249 334 |1464 |1630 |720 7505 3531 
OL 589 |463 250 335 |684 1632 |720 7501 3534 
VCs (tree) Size Old 6705 |5576 |3077 |5097 | 47759 | 15378 |12144 |126968 | 78962 
OCBSL | 6479 | 5546 | 3073 |5063 49775 | 14514 |11465 | 125289 |75837 
OL 6457 | 5546 2982 |5000 34173 14482 | 11444 | 125037 | 75307 
Solving Time [s] Old 2.48 |5.61 |3.72 |5.79 |4.17 |7.97 |14.27 |118.61 |108.42 
OCBSL | 1.91 |5.22 |3.52 |5.75 3.43 |5.73 14.27 102.48 | 104.27 
OL 1.70 |4.92 3.06 (5.34 3.66 7.03 (13.57 134.73 | 104.60 
Total Time [m:s] Old 0:27 |0:36 0:16 |0:21 0:59 14:02 |1:36 = 51:01 115:39 
OCBSL| 0:29 |0:38 (0:17 |0:22 (1:04 | 14:33 | 1:37 | 50:08 | 120:48 
OL 0:29 | 0:38 (0:16 | 0:22 |1:10 14:43 | 1:46 | 58:05 116:09 
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Evaluation on Programming Assignments. We additionally evaluate our 
approach on benchmarks consisting of many student solutions for several pro- 
gramming assignments. We consider benchmarks from [32,33], obtained by trans- 
lation of student solutions in OCaml [38]. In this evaluation, we only prove ter- 
mination of all student solutions, which is one of the bottlenecks when proving 
correctness of students solutions. We annotated all benchmarks with explicit 
decreasing measures. Stainless generates verification conditions that require the 
measure to decrease in recursive calls. Caching is particularly desirable in this 
scenario, with many programs and a high degree of similarity. Table 3 shows our 
evaluation results, comparing the two new simplifiers (OCBSL and OL) to the 
old one. 

First, we note that moving from Old to OCBSL to OL reduces the number of 
calls to the solver. Furthermore, many new VCs are proven valid by normaliza- 
tion alone (reduced to 1). The largest benefit of OL is in the sigma benchmark, 
where the subsumption of linear arithmetic literals in the simplifier substan- 
tially increases the number of formulas proven by normalization: from 6 (0.4%) 
in OCBSL to 794 (52%) for OL. 

The new simplifiers improve the number of cache hits, even if not as much 
as for the Bolts case studies. The smaller reduction is because there is a high 
degree of similarity across the submissions, so the Old simplifier already achieves 
a large percentage of cache hits. Note also that a smaller number of cache hits 
in the sigma benchmark is because many of the VCs are proven valid by the 
simplifier, avoiding the need to consult the cache or the solver in first place. 

Second, we notice a slight reduction in the overall VC size, with a couple of 
exceptions where OCBSL resulted in a size increase due to inlining. Thanks to 
formulas proven by normalization and improved cache hits, the overall solving 
time decreases in several benchmarks. The wall clock running time is approxi- 
mately unchanged, but we expect such benefits in the future. 


7 Conclusion 


We proposed a new approach to simplify and reason about formulas, based on 
algorithms which are sound and complete for the normal form problem (and the 
word problem) of two subtheories of Boolean algebra. These algorithms are sound 
but incomplete for Boolean algebras (and thus for the two-element boolean alge- 
bra of propositional logic). We introduced and proved the correctness of a new 
algorithm to compute normal forms in a theory of ortholattices, which do not 
enforce the distributivity law but only its weaker variation, absorption. Our algo- 
rithm runs in time O(n”). A weaker subtheory, OCBSL, gives up the absorption 
law. The disadvantage of OCBSL is a weaker normal form, whereas the advan- 
tage is that we know of an algorithm running in subquadratic time, O(n log(n)?). 
We evaluated both algorithms, using them to reduce the size of large random 
formulas and combinatorial circuits, showing that they work well with structure 
sharing. We also implemented the algorithms in the Stainless verifier, where 
computing normal forms reduced the size of formulas given to the solver and 
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improved the cache hit ratio. Our experimental evaluation confirmed that the 
tradeoff between normal form strength and the asymptotic complexity remains 
visible in practice. We found both algorithms useful in practice. OCBSL normal- 
ization has excellent running time even for very large circuits, so we believe it 
can replace the simpler negation normal form and syntactic equality checking at 
low cost in essentially all applications. The quadratic cost of the OL algorithm 
is too prohibitive on circuits over 10’ gates. However, this was not a problem for 
its application to verification conditions in Stainless, where its added precision 
and the ability to compare atomic formulas made it more effective in normal- 
izing certain formulas to True and increasing cache hits. In some of the most 
difficult case studies, such as Quite OK Image Format [10], these improvements 
translated into substantial reduction of the wall clock time. Such measurable 
improvements, combined with theoretical guarantees, make the OL and OCBSL 
algorithms an appealing building block for verification systems. 
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Abstract. This paper describes Kratos2, a tool for the verification of 
imperative programs. Kratos2 operates on an intermediate verification 
language called K2, with a formally-specified semantics based on SMT, 
allowing the specification of both reachability and liveness properties. It 
integrates several state-of-the-art verification engines based on SAT and 
sMT. Moreover, it provides additional functionalities such as a flexible 
Python API, a customizable C front-end, generation of counterexamples, 
support for simulation and symbolic execution, and translation into mul- 
tiple low-level verification formalisms. Our experimental analysis shows 
that Kratos2 is competitive with state-of-the-art software verifiers on a 
large range of programs. Thanks to its flexibility, Kratos2 has already 
been used in various industrial projects and academic publications, both 
as a verification back-end and as a benchmark generator. 


1 Introduction 


We present Kratos2, a tool for the verification of real-world imperative pro- 
grams. Kratos2 is a complete rewrite and redesign of Kratos [17], improving and 
extending it in multiple directions. First, Kratos2 introduces a simple yet expres- 
sive intermediate language called K2, with a formally-specified semantics based 
on Satisfiability Modulo Theories (SMT), which is parametric on the underlying 
SMT theory. K2 is expressive enough to capture most of the features of real-world 
C programs, such as pointers, dynamic memory allocation, floating-point data 
types, and bit-precise semantics of bounded integers, which the old version of the 
tool could not handle (being limited to C programs without pointers and recur- 
sion, and in which C integers were interpreted as mathematical integers). Kratos2 
comes with a separate C front-end c2Kratos that can translate C programs to 
K2. Second, Kratos2 includes a variety of state-of-the-art verification back-ends 
based on either symbolic model checking or symbolic execution with SAT and SMT 
solvers. Besides reachability properties, Kratos2 also supports various forms of 
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liveness properties, which can be used to encode termination and more complex 
linear-time temporal properties. Third, Kratos2 implements an interactive inter- 
preter, which can simulate K2 programs using non-deterministic inputs provided 
either by the user or by external oracles. Kratos2 also supports counterexample 
reconstruction, another feature not available in the original Kratos. 

The new intermediate language K2 enables modular translation of C pro- 
grams into various verification languages. Namely, Kratos2 can be used for trans- 
lating C programs into nuXmv [14], vMT [20], AIGER [9], BTor2 [31], Con- 
strained Horn Clauses (CHCs) [11], or Boogie [29] formats. Additionally, Kratos2 
comes with a Python API for construction and manipulation of K2 programs, 
which the users can leverage to implement custom front-ends and generators of 
K2 programs and also additional translators from K2 to other formalisms. 

Although Kratos2 has not been described in a publication until now, it has 
already been successfully used in several research and industrial projects. In 
particular, Kratos2 has been used as a back-end for the verification of automotive 
software in the context of the AUTOSAR platform [15,16]; of C code automatically 
generated from AADL specifications by the TASTE development environment [12]; 
and for verification of C code for railway interlocking systems automatically 
generated from the specifications in a controlled natural language [1]. Kratos2 
has also been used as a benchmark generator to produce symbolic transition 
systems from C programs [30]. 

The rest of the paper is structured as follows. The functionalities offered by 
Kratos2 from the user perspective are described in Sect. 2; Sect. 3 introduces K2, 
describing its syntax and formal semantics. The internal architecture of Kratos2, 
with details about its main components, is presented in Sect. 4; implementation 
notes and experimental evaluation on C programs from the annual software ver- 
ification competition SV-COMP are provided in Sect. 5. Finally, Sect. 6 concludes 
the paper and presents directions for future developments. 


2 Functional View 


In this section we provide a high-level overview of the functionalities available 
in Kratos2. More details will then be provided in the following sections. 


An Intermediate Language for Imperative Programs. The core of Kratos2 
is built around an idealized language for imperative programs called K2. Unlike 
common high-level real-world programming languages, K2 has a simple and clean 
semantics based on first-order logic modulo theories that is fully formally spec- 
ified. The K2 language, similar in spirit to other intermediate verification lan- 
guages proposed in the literature such as Boogie [29] or Why3 [26] (although 
less feature rich than the two), is at the same time simple enough to be easily 
manipulated and translated into formalisms used by SAT-based and SMT-based 
verification back-ends on one hand, and expressive enough to efficiently capture 
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a significant subset of C on the other, as demonstrated also by our experimental 
results on standard sv-COMP benchmarks (see Sect. 5). 


Verification of Safety and Liveness with Multiple Back-Ends. Kratos2 
implements multiple state-of-the-art verification algorithms based on SAT and 
SMT, supporting both bit-precise reasoning over machine integers and floating- 
point numbers as well as higher-level reasoning based on, e.g., mathematical 
integers, real numbers, and uninterpreted functions, depending on the combina- 
tions of theories used in the input K2 program under analysis. Moreover, Kratos2 
supports not only the verification of safety properties (via a reduction to reach- 
ability of designated “error” program locations), but it also supports liveness 
properties such as proving that a specific program location is reached a finite 
number of times in all executions, or that it is always visited infinitely often in 
all infinite executions. 


A Python API for Program Manipulations. Kratos2 provides a rich and 
flexible Python API for parsing, printing, and manipulating K2 programs and 
expressions, which can be used to implement converters from high-level languages 
to K2 or to directly generate K2 programs from user-specific applications. 


A Customizable C Front-End. Kratos2 comes with a front-end for C pro- 
grams which supports a wide range of customization options for controlling the 
translation from C to K2. These range from the choice of theories to use to encode 
C data types (e.g., bit-vectors or unbounded integers), to the use of customized 
program transformations or the injection of new built-in functions with special 
meaning (such as special assume, malloc, or memset built-ins). Thanks to its 
plug-in architecture, the front-end can be easily customized for domain-specific 
subsets of C, for example to implement special optimization passes that are safe 
only in the given context, or to automatically inject properties to the code based 
on specification files (as is, e.g., the case in SV-COMP [3]). 


Encoding into Multiple Formalisms. Kratos2 can be used as an encoder 
or benchmark generator because it can translate imperative programs written 
in C or in K2 into other formalisms, including symbolic transition systems in 
nuXmv [14], vMT [20], AIGER [9] or BTOR2 [31] formats, Constrained Horn 
Clauses (CHCs) [11], or other intermediate verification languages like Boogie [29]. 


Simulation and Symbolic Execution. Finally, Kratos2 can be used as an 
interpreter, allowing an (interactive) simulation of K2 programs and their sym- 
bolic execution, as an alternative to the verification back-ends based on model 
checking. 


3 The K2 Language 


In this section we introduce K2, the intermediate verification language used by 
Kratos2. We present its abstract syntax, formally define its semantics, and discuss 
its support for safety and liveness properties. 
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(stmt) ::= (assign-stmt) | (havoc-stmt) ::= havoc (symbol) 

(assume-stmt) | (jump-stmt) ::= jump (symbol) (symbol-list) 
(call-stmt) | (label-stmt) ::= label (symbol) 
(havoc-stmt) | (symbol-list) ::= (symbol) * 
(jump-stmt) | (expr) ::= (var-expr) | (op-expr) 
label-stmt) (var-expr) ::= var (symbol) 

(assign-stmt) ::= assign (symbol) (expr) (op-expr) ::= op (symbol) (expr-list) 

(assume-stmt) ::= assume (expr) (expr-list) ::= (expr)* 


(call-stmt) ::= call (symbol) 
(expr-list) 
(symbol-list) 


Fig. 1. Abstract syntax of K2 statements and expressions. 


(program) ::= (globals) (init) (functions-list) (entrypoint) 


(globals) ::= globals (var-decl-list) 


t+ 


(init) := init (expr) 


(functions-list) ::= (function) * 


(entrypoint) ::= entry (symbol) 
x= (stmt)? 
:= (var-decl)* 


:= var (symbol) (sort) 


) 
) 
) 
) 

(function) ::= function (symbol) (var-decl-list) (var-decl-list) (var-decl-list) (stmt-list) 
) 
(stmt-list) 
(var-decl-list) 
) 


(var-decl 


Fig. 2. Abstract syntax of K2 programs. 


Abstract Syntax. We denote lists of elements with an overbar, i.e., ~. If @is a 
list, |@| is its length, and if 7 is a natural number, @; is the i-th element of a. If 
e is an element, @- e is the list obtained by appending e at the end of G. 


Definition 1 (Variables and Functions). A variable is a symbol with an 
associated sort, as in the multi-sorted first-order logic. A function is a tuple 


(f,a,7,1,0), where: 


- f, a symbol, is the name of the function; 

- G, a list of variables, are the formal parameters; 
- T, a list of variables, are the return variables; 

- 1, a list of variables, are the local variables; 


- F, a list of statements generated by the grammar of Fig. 1, are the body. 


Given a list of variables 0, we define syms(0) as the corresponding set of 
symbols. Given a function (f,@,7,1,7), we denote with syms( f) the set syms(@) U 


syms(7) Usyms(l). We extend the definition to lists of statements g in the natural 
way. We now describe K2 programs, whose abstract syntax is shown in Fig. 2. 
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Definition 2 (Programs). A program P is a tuple (g, F,1,e), where: 


- g, a list of variables, are the global variables; 

- F is a partial mapping from symbols to functions; 
, a formula, is the constraint on initial states; 

- e, a symbol in dom(F), is the entry point. 


| 
gs 


Semantics. We use the standard notions of theory, interpretation, model, and 
satisfaction from many-sorted first-order logic and SMT [2]. In the following, we 
assume that we have fixed a theory T with equality that contains at least the sort 
Bool. Given an interpretation u that is a model for T, we define the evaluation 
of an expression e (generated by the grammar of Fig. 1) under u, denoted p[el, 
as ple] = u(v) for e = var v and ple] = (0) (u[p;],-.-, u[D,,]) for e = op o p and 
n = |p|. We denote with ju[v + e] the interpretation that maps v to e, and that 
agrees with u everywhere else, and with y[\v] any interpretation that agrees 
with u on all the symbols except v. Finally, if e is of sort Bool, we write u } e 
to denote that e evaluates to true under wp. 


Definition 3 (Program states). Pairs (f,i) where f is a function name and 
i is a natural number are called program locations. A state of a program P is a 
pair s = (G,C) where: 


- G is an interpretation for the global variables of P; 

- C is the current call stack, a list of triples (f,i,L), where (f,i) is a program 
location and L is an interpretation of syms( f), i.e., of parameters, return 
variables, and local variables of F(f). 


A state s is initial if and only if G Fu, |C| = 1 and Cy = (e,1,L) for some 
L. Given a state s with Cio = (f,i, L), we define the current interpretation p 
for s as u(v) = G(v) for v € syms(g) and as (v) = L(v) otherwise. 


We define the semantics for programs as a set of transition rules of the form 
s % s', where s,s’ are states and ø is a statement. We then call a path of a 
program P any sequence of transitions (possibly infinite) so “>... Z5 sj41... 
that complies with the transition rules and where sọ is an initial state. 

The rules are shown in Fig.3. In the definitions, we fix a program P = 
(gJ, F,ı,e) and use the following convenience functions, where f is a function 
name and i a natural number: arg(f,7) returns the variable @; of the function 
F(f); ret(f, i) returns the variable 7; of the function F(f); stmt(f,7) returns the 
statement g; of F(f); stmts(f) returns the list of statements o of F(f). 


Reachability and Liveness. We then say that a state s is reachable in P iff 
there exists a finite path sọ => ... Z s that ends in s. Similarly, a program loca- 
tion (f,i) is reachable iff there exists a path as above in which on = stmt(f,i)'. 


1 Note that here we assume w.l.o.g. that all statements in a program are different, 
even when they are structurally equal, so the above definition is unambiguous. 
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assign-global: (G,C - (f,i, LY) ame 

assign v e and v € syms(9g); 
F = 7 stmt(f,i) 

assign-local: (G,C - (f,i, L) ———> 
assign v e and v € syms(f); 

assume: (G,C - (f,i, LY) Im, 
Ee; 

havoc-global: (G,C - (f, i, LY) 
and v € syms(g); 

havoc-local: (G,C - (f,i, L)) 
and v € syms(f); 

call: (G,C- (f,i, LY) (G,C- (f,i, L) - (g,1, L’)) if stmt( f, i) = call g € 7, where 
L' (v) = plē;] if v = arg(g, j). 

return: (G,O-(f,i, L) (g, k, L”y) Z8, (G! O. (f,i+1, L’) if stmt(f, i) = call g EF 
and k > |stmts(g)|, where: 


1, — J elret(g, j)] if v = Fj 1. _ fS ulret(g,j)] if v =F; 
S e otherwise and. UPS L(v) otherwise 


(Glu + ulel], C - (f,i +1, Ly) if stmt(f, i) = 
(G,C - (f,it1,L[v = ule]])) if stmt(f,i) = 


(G,C - (f,i + 1, L)) if stmt(f,i) = assume e and 


stmt(f,i) a 


(G{\v], C - (f, i+ 1, LY) if stmt(f, i) = havoc v 


stmt(f,i) 


(G,C + (f,i + 1, L[\v])) if stmt(f, i) = havoc v 


stmt(f,i) 
eae 


stmt(f,i) 


jump: (G,C- (f,i, L)) ———> (G,C. (f, k, L)) if stmt(f, i) = jump t and stmt(f,k) = 
label | with | € t; 
label: (G,C - (f,i, L)) 


stmt(f,i) 


(G,C-(f,i +1, L)) if stmt(f, i) = label l. 


Fig. 3. Transition rules. In all the rules, u denotes the current interpretation for the 
left-hand state of the rule. 


Conversely, if no such path exists, then (f,i) is unreachable. The location (f, 7) 
is infinitely-often reachable iff there exists an infinite path sp £> ... Z5 sj41... 
in which for all indices j there exists an index k > j such that op = stmt( f, i). 
If no such path exists, then (f,7) is eventually unreachable. Finally, we say that 
(f, 7) is live iff it is infinitely-often reachable in all infinite paths of P. 

In K2, queries about reachability or liveness of program locations are 
expressed via annotations of label statements. Annotations are metadata that 
are attached to statements, in the form of key-value pairs, which do not affect 
the semantics of the program, but are meant to provide additional information 
that can be used by tools that manipulate the K2 program. Specifically, Kratos2 
uses the following annotations to define properties: 


error <id>: holds iff all labels annotated with the same <id> are unreachable; 

notlive <id>: holds iff all labels annotated with the same <id> are eventually 
unreachable; 

live <id>: holds iff all labels annotated with the same <id> are live. 


These basic properties can be easily used to represent more common higher-level 
properties of programs, such as assertions and termination. For example, asser- 
tions can be reduced to reachability with a combination of assume and jump 
statements, whereas termination can be checked by adding a final self loop over a 
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label with an attached live annotation. Finally, eventual unreachability can be 
used to encode arbitrary LTL properties using the standard automata-theoretic 
approach combined with a symbolic encoding of the accepting automaton such 
as [22].? 


3.1 Example 


We conclude this section with a simple example of a C program and its equiv- 
alent formulation in K2. Both versions are shown in Fig.4. Most of the code 
is translated in a fairly direct way (with conditional statements and structured 
loops translated into nondeterministic jumps constrained by assumptions). How- 
ever, since in K2, unlike in C, global variables are uninitialized by default, the 
K2 program contains an additional setup function (called init_and_main in 
the example) that sets glb1 to zero before calling the original main. Another 
point to highlight is the use of the :error annotation (highlighted in bold) to 
model the C assertion. 


4 Architectural View 


This section describes the main components of Kratos2 and the flow of infor- 
mation among them. From the high-level point of view, Kratos2 is composed 
of the front-end c2Kratos, which converts the input C program to the K2 lan- 
guage, and of the core Kratos2, which is responsible for parsing, simplifications, 
transformations, and verification of K2 code. This separation helps to keep the 
core Kratos2 simple, as it does not have to handle the complex semantic nuances 
of C. Moreover, it makes it easy to add front-ends for new languages by writing 
a separate translator from the language in question to K2. 

The front-end c2Kratos reads the input C file, builds its abstract syntax tree 
(AST) and then builds the corresponding K2 code in two passes. In the first 
pass, it converts the AST to an extended K2. Compared to the standard K2, 
the extended K2 also has primitives for pointers, records, complex loops, and 
compound instructions. These are removed in the second pass, by converting 
pointers to operations over maps, records to multiple variables, complex loops 
to sequences of assignments, jump instructions, and assumptions, and compound 
instructions to sequences of basic assignments to auxiliary variables. 

The core Kratos2 consists of several components, whose relationships are 
visualized in Fig. 5: 


? In the case of LTL properties, the question arises as to what to consider as an atomic 
step of the program. This is both crucial and application-dependent: for example, in 
embedded software consisting of a “transition function” that is executed periodically, 
it might make sense to consider each call to such function as one step, whereas in 
other contexts a more fine-grained notion of step might be needed. K2 (and Kratos2) 
makes no commitment about this, providing only the support for eventual unreach- 
ability of label statements, which can always be defined unambiguously. 
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C version K2 version 


(type cint (sbv 32)) 
(entry init_and_main) 
(globals (var glbl cint)) 


(function f ((var x cint)) 
(return (var ret cint)) (locals) 
int glbl; (seq 
(jump (label then) (label else) ) 
(label then) 


int f(int x) (assume (op gt glbl (const 0 cint))) 
{ (assign ret (sub x (const 1 cint))) 
if (glbl > 0) { (jump (label end)) 


_ (label else) 
return x l; (assume (op not (op gt glbl (const 0 cint)))) 


} else { (assign glbl (const 0 cint)) 
glbl = 0; (assign ret x) 
(label end))) 
return x; 
} (function main () (return) (locals (var y cint)) 
(seq 
} (label while) 
2 A = (jump (label inwhile) (label endwhile)) 
void main (void) (label inwhile) 
{ (assume (op gt y (const 0 cint))) 
i R (call f y y) 
int y; (jump (label while)) 
while (y > 0) { (label endwhile) 
y = fly); (assume (op not (op gt y (const 0 cint)))) 
} (jump (label then) (label else)) 
(label then) 
assert (glbl == 0); (assume (op not (op eq glbl (const 0 cint)))) 


} (! (label err) :error assert-fail) 
(label else))) 


(function init_and_main () (return) (locals) 
(seq 
(assign glbl (const 0 cint)) 
(call main))) 


Fig. 4. Example C program and its K2 translation. 


CFG builder and simplifier reads the input K2 file and builds the correspond- 
ing interprocedural control flow graph (CFG). It then performs several simplifica- 
tions of the CFG, such as constant propagation and lightweight slicing. The result 
can be used either by the interpreter, symbolic executor, or one of the encoders. 
The simplified CFG can also be converted back into a K2 representation. 


Interpreter interprets the CFG using the externally provided inputs to guide 
the execution. The inputs contain new values for all havoc commands and also 
destination labels for all nondeterministic jump commands. The inputs can be 
provided by the user, a random generator, or by one of the verification engines. 
The last option is used for counterexample reconstruction and validation. 


Transition system encoder encodes the CFG to a symbolic transition system 
over a suitable theory. The encoder first inlines all function calls in the program. 
It then encodes the resulting inlined program using large block encoding [4], 
which allows encoding larger acyclic subgraphs of the CFG by a single transition 
formula. The resulting transition system can be verified by one of the available 
verification back-ends, or converted to a textual representation in one of the 
available output formats (vMT [20], nuXmv [14], Bror2 [31], or AIGER [9]).° 


3 Depending on the features of the input K2 program, some of the verification back- 
ends or output formats might not be available. E.g., sAT-based engines are not 
available if the K2 program contains some infinite-state variables. 
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Fig. 5. Architecture of Kratos2. 


CHC encoder converts the CFG to a set of Constrained Horn Clauses [11]. In 
contrast to the transition system encoder, the CHC encoder supports interproce- 
dural analysis and recursive functions, encoded as a set of non-linear CHCs as 
described, e.g., in [28]. 


Symbolic executor implements a classical symbolic execution algorithm with 
iterative deepening to avoid getting stuck in long uninteresting branches. It sup- 
ports (possibly recursive) K2 programs over arbitrary combinations of integers, 
reals, bit-vectors, floats, and arrays. 


SMT-based engines encompass several SMT-based verification algorithms of 
symbolic transition systems. For reachability properties, Kratos2 implements 
standard bounded model checking (BMC) [7], k-induction [32], and IC3 with 
implicit predicate abstraction [18]. For liveness properties, we use a procedure 
combining liveness-to-safety reduction with ranking functions synthesis [23]. 


SAT-based engines encompass several verification algorithms of finite-state 
symbolic transition systems. Namely, for transition systems over the theory of 
bit-vectors and floats, Kratos2 offers BMC, k-induction, and different variants 
of IC3 [13], working over the bit-blasted Boolean transition system, for both 
reachability and liveness properties. Additionally, Kratos2 implements a dedi- 
cated engine for reachability properties in transition systems over the theory of 
bit-vectors, floats, and arrays similar to [10,30]. 


5 Implementation and Experimental Evaluation 


Implementation. Core Kratos2 is implemented in C++ on top of the Math- 
SAT5 [19] SMT solver and the nuXmv [14] symbolic model checker. The SAT-based 
verification engine additionally makes use of the MiniSat [25] and CaDiCaL [8] 
SAT solvers. The front-end c2Kratos is implemented in Python and relies on 
pycparser for parsing of the input C program. Kratos2 is freely available for 
non-commercial purposes from https://kratos.fbk.eu. 
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Table 1. Solved benchmarks by the three compared tools. Column U shows the number 
of solved unsafe benchmarks, S of safe benchmarks, and W of wrong results. 


CPAchecker Kratos2 VeriAbs 
Family U S W U S W U S W 
arrays 70 510 75 7/10 106| 261 0 
bitvectors 13 31 0 13 33 0 14 31 |0 
combinations | 295 36 0 282 47 0 277 770 
controlflow 39 36 |0 40 37 |0 40 47 0 
eca 223 | 481 |0 210 | 365 |0 467 | 600 0 
floats 41 | 356 |0 43| 350 0 43 | 393 0 
heap 7L) 18/1 67 | 102 0 70 | 120/0 
loops 152 | 334 | 2 159 | 307 0 192 | 427/0 
productlines 265| 3320 262 | 315 |0 260 | 322 |0 
recursive 40 36 |1 43 28 |0 46 41 0 
sequentialized | 347 | 108 |0 361 68 0 361| 123 0 
xcsp 50 52 0 51 51/0 52 52 0 
Total 1606 |1925 |4 | 1606 |1710 0 |1928 | 2494 | 0 


Experimental Setup. We performed an experimental evaluation to answer two 
research questions: Is the K2 language expressive enough to efficiently represent 
realistic C programs? Do the engines implemented in Kratos2 offer reasonable 
performance on realistic verification tasks? To this end, we considered all the 
C programs from the ReachSafety category of the 2022 edition of the annual 
software verification competition SV-COMP [3].The category consists of 5400 C 
programs divided into 12 benchmark families. We compared Kratos2 with Veri- 
Abs 1.4.2 [24] and CPAchecker 2.2 [5], respectively the winner and runner-up 
of the ReachSafety category of SV-COMP 2022. Similarly to the approach used 
by CPAchecker, we executed Kratos2 in sequential portfolio mode, which succes- 
sively runs symbolic execution, SMT-based IC3, SAT-based IC3, and SMT-based 
BMC with predetermined time-outs for each of the engines. 

The experiments were performed on several identical PCs equipped with Intel 
Core i7-8700 CPU @ 3.20 GHz and 32 GiB of RAM. Each execution was limited 
to use a single CPU core, 15min of CPU time, and 8 GiB of RAM. For reliable 
benchmarking, all experiments were executed using BENCHEXEC [6]. A replica- 
tion package describing the details of the setup is available at https: //doi.org/ 
10.5281 /zenodo.7890411. 


Results. To answer the first research question, we observe that from the total 
5400 benchmarks, only 56 were not converted to K2 by c2Kratos due to unsup- 
ported floating point built-ins or features such as variable length arrays. 

To answer the second research question, Table 1 shows the numbers of solved 
benchmarks by the individual tools and quantile plots in Fig.6 show their run- 
ning times. The results show that Kratos2 is competitive with CPAchecker on 
all benchmark families except for eca. It is also competitive with VeriAbs on 
most benchmark families. There are 23 benchmarks uniquely solved by Kratos2, 
48 by CPAchecker, and 1039 by VeriAbs. Moreover, both Kratos2 and VeriAbs 
produced no wrong results, unlike most other participants of SV-COMP. 
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Fig. 6. Quantile plots of solved benchmarks for all three compared tools in individual 
benchmark families. The plot shows the number of benchmarks (y-axis) that were 
solved within the given number of seconds (x-axis). 


We remark that CPAchecker is an established and optimized software verifier 
that regularly scores high in software verification competitions, and that Veri Abs 
implements algorithm selection heuristics, using both its own custom engines and 
external state-of-the-art verifiers. As such, it is not surprising that it performs 
much better than Kratos2 and CPAchecker on some of the families. 

We conclude that the K2 language is expressive enough to efficiently capture a 
significant subset of C used in realistic programs. Furthermore, the verification 
engines implemented in Kratos2 mostly offer a performance comparable with 
state-of-the-art software verifiers. 


6 Conclusions and Future Work 


We have described Kratos2, a mature software verifier for imperative programs 
written in K2, a new intermediate verification language with a formal semantics 
based on SMT. Kratos2 is a complete rewrite of the original Kratos tool, offering 
significant extensions in functionalities and performance. The tool has already 
been successfully applied in various contexts, both industrial and academic. 

As future work, we will consolidate the (currently alpha-quality) implemen- 
tation of the ESST algorithm of the original Kratos [21] to handle multithreaded 
programs with cooperative scheduling. We will also investigate a tighter integra- 
tion with CHC solvers to better handle recursive programs, as well as improved 
techniques to handle arrays and pointers such as [27,33]. On the language side, 
we plan to add support for contracts and pre-/post-conditions via annotations. 
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Abstract. We show that interactive protocols between a prover and a 
verifier, a well-known tool of complexity theory, can be used in practice 
to certify the correctness of automated reasoning tools. 

Theoretically, interactive protocols exist for all PSPACE problems. 
The verifier of a protocol checks the prover’s answer to a problem instance 
in probabilistic polynomial time, with polynomially many bits of commu- 
nication, and with exponentially small probability of error. (The prover 
may need exponential time.) Existing interactive protocols are not used 
in practice because their provers use naive algorithms, inefficient even for 
small instances, that are incompatible with practical implementations of 
automated reasoning. 

We bridge the gap between theory and practice by means of an inter- 
active protocol whose prover uses BDDs. We consider the problem of 
counting the number of assignments to a QBF instance (##CP), which 
has a natural BDD-based algorithm. We give an interactive protocol for 
##CP whose prover is implemented on top of an extended BDD library. 
The prover has only a linear overhead in computation time over the nat- 
ural algorithm. 

We have implemented our protocol in blic, a certifying tool for #CP. 
Experiments on standard QBF benchmarks show that blic is compet- 
itive with state-of-the-art QBF-solvers. The run time of the verifier is 
negligible. While loss of absolute certainty can be concerning, the error 
probability in our experiments is at most 107° and reduces to 1071°* 
by repeating the verification k times. 
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1 Introduction 


Automated reasoning tools often underlie our assertions about the correctness 
of critical hardware and software components. In recent years, the scope and 
scalability of these techniques have grown significantly. 

Automated reasoning tools are not immune to bugs. If we are to trust their 
verdict, it is important that they provide evidence of their correct behaviour. A 
substantial amount of research has gone into proof-producing automated reason- 
ing tools [4,14,16,22,23]. These works define a notion of “correctness certificate” 
suitable for the reasoning problem at hand, and adapt the reasoning engine to 
produce independently checkable certificates. For example, SAT solvers produce 
either a satisfying assignment or a proof of unsatisfiability in some proof system, 
e.g. resolution (see [16] for a survey). Extending such certificates beyond boolean 
SAT is an active area of current research [3,4, 18,24, 29]. 

In the worst case, the size of certificates grows exponentially in the size of 
the input, even for boolean unsatisfiability (unless NP = coNP). If users have 
limited computational or communication resources, transferring and checking 
large certificates becomes a burden. Large certificates are not just a theoretical 
curiosity. In practice, resolution proofs for complex SAT problems may run to 
petabytes [15]. Ideally, we would prefer “small” certificates (polynomial in the 
size of the input) which can be checked independently in polynomial time. 

The IP = PSPACE theorem proves that certification with polynomial ver- 
ification time is possible for any problem in PSPACE, provided one trades off 
absolute certainty for certainty with high probability [27]. The complexity class 
IP consists of those languages for which there is a polynomial-round, complete 
and sound interactive protocol [1,2,13,20|—a sequence of interactions between a 
(computationally unbounded) prover and a (computationally bounded) verifier 
after which the verifier decides whether the prover correctly performed a compu- 
tation. The protocol is complete if, whenever an input belongs to the language, 
there is an honest prover who can convince a polynomial-time randomised ver- 
ifier in a polynomial number of rounds. The protocol is sound if, whenever an 
input does not belong to the language, the Verifier will reject the input with 
high probability—no matter what certificates are provided to the Verifier. That 
is, a “Prover” cannot fool the certification process. 

Since every language in PSPACE has an interactive protocol, there are interac- 
tive protocols for UNSAT, QBF, counting QBF, safety verification of concurrent 
state machines, etc. Observe that the prover of a protocol may perform expo- 
nential time computations (which is unavoidable unless P = PSPACE), but the 
verifier only requires polynomial time in the original input. 

If interactive protocols provide a foundation for small and efficiently verifiable 
certificates (at least for problems in PSPACE), why are they not in widespread 
practice? We believe the reason to be the following: for asymptotic complexity 
purposes, it suffices to use honest provers with best-case exponential complexity 
that naively enumerate all possibilities. Such provers are incompatible with auto- 
mated reasoning tools, which use more sophisticated data structures and heuris- 
tics to scale to real-world examples. So we need to make practical algorithms 
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for automated reasoning efficiently certifying. We call an algorithm efficiently 
certifying if, in addition to computing the output, it can execute the steps of an 
honest prover in an interactive protocol with only polynomial overhead over its 
running time. 

In this paper, we show that algorithms using reduced ordered binary decision 
diagrams (henceforth called BDDs) [9] can be made efficiently certifying. We 
consider #CP, the problem of computing the number of satisfying assignments 
of a circuit with partial evaluation (CP). Besides boolean nodes, a CP contains 
partial evaluation nodes Tz.—false] (TESP-, Tje:=true]) that take a boolean predicate 
as input, say y, and output the result of setting x to false (resp., true) in p. ##CP 
generalises SAT, QBF, and counting SAT (#SAT), and has a natural algorithm 
using BDDs: Compute BDDs for each node of the circuit in topological order, 
and count the accepting paths of the final BDD. 

The theoretical part of the paper proceeds in two steps. First, we present 
CPCERTIFY, a complete and sound interactive protocol for #CP. CPCERTIFY 
is similar to the SUMCHECK protocol [20]. It involves encoding boolean formulas 
as polynomials over a finite field. The prover is responsible for producing certain 
polynomials from the original circuit and evaluating them at points of the field 
chosen by the verifier. These polynomials are either multilinear (all exponents 
are at most 1) or quadratic (at most 2). 

Second, we show that an honest prover in CPCERTIFY can be implemented 
on top of a suitably extended BDD library. The run times of the certifying BDD 
algorithms are only a constant overhead over the computation time without 
certification—they depend linearly on the total number of nodes of the interme- 
diate BDDs computed by the prover to solve the ##CP instance. We use two key 
insights. The first is an encoding of multilinear polynomials as BDDs; we show 
that the intermediate BDDs represent all the multilinear polynomials a prover 
needs during the run of CPCERTIFY. The second shows that the quadratic poly- 
nomials correspond to intermediate steps during the computation of the interme- 
diate BDDs. We extend BDDs with additional “book-keeping” nodes that allow 
the prover to also compute the quadratic polynomials while solving the problem. 
So computing the polynomials required by CPCERTIFY has zero additional cost; 
the only overhead is the cost of evaluating the polynomials at the field points 
chosen by the verifier. 

We have implemented a certifying #CP solver based on our extended BDD 
library. Our experiments show that the solver is competitive with state-of-the-art 
non-certifying QBF solvers, and can outperform certifying QBF solvers based 
on BDDs. The number of bytes exchanged between the prover and the veri- 
fier are an order of magnitude smaller, and Verifier’s run time several orders 
of magnitude smaller, than current encodings of QBF proofs, while bounding 
the error probability to below 10719. Thus, our results open the way for practi- 
cally efficient, probabilistic certification of automated reasoning problems using 
interactive protocols. 


Additional Related Work. Proof systems for SAT and QBF remain an active 
area of research—both in theoretical proof complexity and in practical tool devel- 
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opment. Jussila, Sinz, and Biere [17,28] showed how to extract extended reso- 
lution proofs from BDD operations. This is the basis for proof-producing SAT 
and QBF solvers based on BDDs [6-8]. As in our work, the proof uses inter- 
mediate nodes produced in the construction of the BDD operations. We focus 
on interactive certification instead of extended resolution proofs, which can be 
exponentially larger than the input formula. 

Recently, Luo et al. [21] consider the problem of providing zero-knowledge 
proofs of unsatisfiability, a motivation similar but not equal to ours. Their tech- 
niques require the verifier to work in time polynomial in the proof, which can be 
exponentially bigger than the input formula. In contrast, the verifier of CPCER- 
TIFY runs in polynomial time in the input. Since any language in PSPACE has a 
zero knowledge proof [5], our protocol can in principle be made zero knowledge. 
Whether that system scales in practice is left for future work. 


Full Version. Detailed proofs can be found in the full version of the paper [11]. 


2 Preliminaries 


The Class IP. An interactive protocol between a Prover and a Verifier con- 
sists of a sequence of interactions in which a Verifier asks questions to a Prover, 
receives responses to the questions, and must ultimately decide if a common 
input x belongs to a language. The computational power of the Prover is 
unbounded but the Verifier is a randomised, polynomial-time algorithm. 

Formally, let P, V denote (deterministic) Turing machines. 

We say that (r; m1, ..., Mop) is a k-round interaction, with r, mı, ..., Mok €E 
{0,1}*, if mig, = V(r, mı, ..., Mi) for even i and mj41 = P(m4,...,m,) for odd 
i. We think of r as an additional sequence of bits given to Verifier V that is chosen 
randomly. The output out(P, V)(x,r, k) is defined as mx, where (r; mı, ..., Max) 
is the unique k-round interaction with m: = a. 

A language L belongs to IP if there are V, Py and polynomials py, po, ps, S-t. 
V(r, £, Mo, ..., M) runs in time p;(|z]) for all r, x, Mmo, ...,m;, and, for each x and 
an r € {0,1}"2()) chosen uniformly at random: 


1. (Completeness) x € L implies out(Py,V)(a,r,p3(|2|)) = 1 with probability 
1, and 

2. (Soundness) x ¢ L implies that for all P we have out(P, V)(x,r,p3(|a|)) = 1 
with probability at most 27!*!, 


Intuitively, in an interactive protocol, a computationally unbounded Prover 
interacts with a randomised polynomial-time Verifier for k rounds. In each round, 
Verifier sends probabilistic “challenges” to Prover, based on the input and the 
answers to prior challenges, and receives answers from Prover. At the end of k 
rounds, Verifier decides to accept or reject the input. The completeness property 
ensures that if the input belongs to the language L, then there is an “honest” 
Prover Py who can always convince Verifier that indeed x € L. If the input does 
not belong to the language, then the soundness property ensures that Verifier 
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rejects the input with high probability no matter how a (dishonest) Prover tries 
to convince them. 

It is known that IP = PSPACE [20,27], that is, every language in PSPACE 
has a polynomial-round interactive protocol. The proof exhibits an interactive 
protocol for the language QBF of true quantified boolean formulae; in particular, 
the honest Prover is a polynomial space, exponential time algorithm that uses a 
truth table representation of the formula to implement the protocol. 


Polynomials. Interactive protocols make extensive use of polynomials over some 
prime finite field F. 

Let X be a finite set of variables. We use x,y, z,... for variables and p,q,... 
for polynomials. When we write a polynomial explicitly, we write it in brackets, 
e.g. [Bzy — 27]. We write 1 and 0 for the polynomials [1] and [0], respectively. 
We use the following operations on polynomials: 


— Sum, difference, and product. Denoted p+q, p— q, p-q, and defined as usual. 
For example, [3xy — 27] + [z?+ yz] = [3xy+ yz] and [x +y]: [x — y] = [x — y’]. 
— Partial evaluation. Denoted mz:—q) p, it returns the result of setting variable 
x to the field element a in the polynomial p, e.g. mz:=5] [Rey — 2°] = [15y — 27]. 
— Degree reduction. Denoted 6, p. It reduces the degree of x in all monomials 
of the polynomial to 1. For example, 6,[x°y + 3x? + 727] = [vy + 3x + 72°]. 


A (partial) assignment is a (partial) mapping o : X — F. We write II, p 
for Tey:=0(01)|-Tax:=o(a,)| P, Where 71,...,2,% are the variables for which ø is 
defined. Additionally, we call o binary if a(x) € {0,1} for each x € X. 


Binary and Multilinear Polynomials. A polynomial is multilinear in x if 
the degree of x in p is 0 or 1. A polynomial is multilinear if it is multilinear 
in all its variables. For example, [xy — y?] is multilinear in x but not in y, and 
[3xy — 2zy] is multilinear. A polynomial p is binary if I, p € {0,1} for every 
binary assignment o. Two polynomials p,q are binary equivalent, denoted p =» q, 
if II, p = II, q for every binary assignment ø. (Note that non-binary polynomials 
can be binary equivalent.) 


3 Circuits with Partial Evaluation 


We introduce circuits with partial evaluation (CP), a compact representation of 
quantified boolean formulae, and formulate ##CP, the problem of counting the 
number of satisfying assignments of a CP. #CP generalises QBF, the satisfiabil- 
ity problem for quantified boolean formulas. Figure 1 shows an example of a CP. 
Informally, it is a directed acyclic graph whose nodes are labelled with variables, 
boolean operators, or partial evaluation operators T{z;—»|- Intuitively, Tz: p sets 
the variable x to the truth value b in the formula g. In this way, each node of a cir- 
cuit stands for a boolean function, and the complete circuit stands for the boolean 
function of the root. Figure 1 shows the formulae represented by each node. 


Definition 1. Let X denote a finite set of variables and S C X. A circuit with 
partial evaluation and variables in S (S-CP) has the form 
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— true, false, or x, where z € S, 
- ay, pA yY, ory Vyp, where p,p are S-CPs, or 
Ty:=b] P, where y E€ X \ S, b € {true, false}, and y is an (S U {y})-CP. 


The set of free variables of a S-CP ¢ is free(y) := S. The children of a CP are 
inductively defined as follows: true, false, and x have no children; the children of 
phy and pV w are p and Y; and the only child of ~p and Tiy:= p is p. The 
set of descendants of p is the smallest set M containing p and all children of 
every element of M. The size of is |p| := |M]. 


We represent a CP v as a directed [1 — z + x? — z£ + x? — r°] 
acyclic graph. The nodes of the = [1 — 22 + 22? — x’] 
graph are the descendants of y. [1—2+27| ET 
A CP ọ encodes a boolean pred- (true | JOG - 
icate P,, which maps assignments 


Ys 
a: free(y) — {false, true} to a truth CTiv=tel > CEID] ae | 
value P(o) € {false, true}. It does so (1 — z] 
in the obvious manner, e.g., P(o) := r 


(vy [1 =£ +y — zy +2? 
a(z), Pony (0) := P(o) A P(o), l | a ee Ke y] 
etc. We use 7.) as partial eval- y 


uation operator, so Pryl) = an 
P,(oU{a + b}). Intuitively, m2:=0) Y 
replaces each occurrence of x in y i=; =a] 


by b. An assignment o satisfies ọ if 
P,(o) = true. We define the macros 


Vep i= Tie:=0) P ^ TMa:=1 Y Fig. 1. A CP (Sect. 3), the boolean functions 
Fey = Me:=0) P V Te:=1] Y represented . each (in boxes), and the 
arithmetisation of the formulae (Sect. 4.1). 


Figure 1 shows a CP for the quanti- 
fied boolean formula Yy (=x V (x A^ y)). 
We consider the following problem: 


Input CP vy. 


er Output The number of satisfying assignments of y. 


Given a quantified boolean formula, we can use the macros for quantifiers to 
construct in linear time an equivalent CP, i.e., a CP with the same satisfying 
assignments. Similarly, #SAT instances can also be reduced to #CP. 


Structure of the Rest of the Paper. In Sect.4, we give an interactive pro- 
tocol for #CP called CPCERTIFY. In Sect.5, we implement an honest Prover 
for CPCERTIFY on top of an extended BDD-based algorithm for #CP. The 
prover runs in time polynomial in the size of the largest BDD for any of the 
subcircuits of the initial circuit. Together, these results yield our main result, 
Theorem 1, showing that any BDD-based algorithm can be modified to run an 
interactive protocol with small polynomial overhead. Finally, Sect.6 presents 
empirical results. 
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4 An Interactive Protocol for CP 


In this section we describe an interactive protocol for #CP, following the SUM- 
CHECK protocol of [20]. Section4.1 introduces arithmetisation, a technique to 
transform #CP into an equivalent problem about polynomials. Section 4.2 shows 
how to transform #CP into an equivalent problem about evaluating polynomials 
of low degree. Finally, Sect. 4.3 presents an interactive protocol for this problem. 


4.1 Arithmetisation 


We define a mapping [-] that assigns to each CP ọ a polynomial [y] over the 
variables free(y), called the arithmetisation of y: 


— [true] := 1; [false] := 0; [x] := [x] for every x € X; and [>y] := 1 — [y]; 


= [y ^4] := [e]: Wh and [¢ v yl := [el + iy- iel- TY: 
- [ne =y P] := 7 2:=pyl¢], with x € free(p), b € {true, false}. 


Figure 1 also shows the polynomials corresponding to the nodes of the CP. 

Let F be a fixed prime finite field. Given an arbitrary truth assignment 
ao: X — {true, false}, let o: X — F be the binary assignment given by a(x) = 1 
if o(x) = true and F(x) = 0 if a(x) = false, where 0 and 1 denote the additive and 
multiplicative identities in F. The mapping [-] is defined to satisfy the following 
property, whose proof is immediate: 


Proposition 1. Let p be an S-CP encoding some boolean predicate P,. Then 
P(o) = Is |p] for every truth assignment o to S. 


So, intuitively, the polynomial [y] is a conservative extension of the predicate 
Po: It returns the same values for all binary assignments. Accordingly, in the 
rest of the paper we abuse language and write ø instead of o for the binary 
assignment corresponding to the truth assignment ø. 

Observe that #CP can be reformulated as follows: given a CP y, compute 
the number of binary assignments ø s.t. II, [p] = 1. 


4.2 Degree Reduction 


Given a CP y, its associated polynomial can have degree exponential in the 
height of y. Since we are ultimately interested in evaluating polynomials over 
binary assignments, and since z? = x for x € {0,1}, we can convert polynomials 
to low degree without changing their behaviour on binary assignments. 

For this, we use a degree-reduction operator 6, for every variable x. The 
operator sp reduces the exponent of all powers of x in p to 1. For example, 
6, [a7y + 3xy? — 2a3y? + 4] = [ry + 3xy? — 2ry? + 4]. Observe that 6,p =p p. 
Instead of working on the input CP directly, we first convert it into a circuit with 
partial evaluation and degree reduction by inserting degree-reduction operators 
after binary operations. This ensures all intermediate polynomials obtained by 
arithmetisation have low degree. 
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Definition 2. A circuit with partial evaluation and degree reduction over the 
set S of variables (S-CPD) is defined in the same manner as an S-CP, extended 
as follows: 


— if p is an S-CPD and x € S, then 6, is an S-CPD, 


— [d2y] := d2[y], and 
— ọ is the only child of dx. 


For an S-CPD p we define free(y), |p|, children, descendants, and the graphical 
representation as for S-CPs. 


We convert a CP y into a CPD conv(y) by 1— z] 


adding a degree-reduction operator for each free 
variable before any binary operation. 


Definition 3. Given a CP p with free(p) = 
{z£1,.. £k}, its associated CPD conv(y) is 
inductively defined as follows: 


- conv(false) = false, conv (true) := true, 


conv(=4) := = conv(Y), conv(m,:=4) Y) := 
Tx:=b] Conv (Y), and 
- conv(t, ® p2) := 6y,...d2,(conv(y1) ® 


conv(w2)), for ® € {V, A}. 


Figure 2 shows the CPD conv(y) for the CP 
y of Fig. 1, together with the polynomials corre- 
sponding to each node. 

We collect some basic properties of CPDs: 


Lemma 1. Let y be a CP. 

(a) [conv(y)] is a binary multilinear polynomial [1 —2| zy] 
and |[conv(y)] =o [y]. 

(b) For every descendant w of conv(y), [Y] has 
maximum degree 2. [z] E) © y] 


CPDs have another useful property. Recall Fig. 2. CPD and polynomials 
that given a CP g we are interested in its number for the CP of Fig. 1. 
of satisfying assignments. The next lemma shows 
that this number can be computed by evaluating 
the polynomial [conv(y)] on a single input. 


Lemma 2. A CP ọ with n free variables has m < |F| satisfying assignments iff 
II, [conv(y)] = m-27-", where o is the assignment satisfying o(x) := 27} in the 
field F for every variable x.t 


' Any prime field F with |F| > 2 has an element c such that 2c = 1. 
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4.3 CPCERTIFY: An Interactive Protocol for #CP 


We describe an interactive protocol, called CPCERTIFY, for a CP » with n 
free variables. Let X denote the variables used in y. Prover and Verifier fix a 
finite field with at least m + 1 elements, where m is an upper bound on the 
number of assignments (e.g. m = 2”). Prover tries to convince the Verifier that 
II, [conv(y)] = K for some K EF. 

In the protocol, Verifier challenges Prover to compute polynomials of the 
form II,([w]), where ~ is a node of the CPD conv(y) and o: free(w) — F 
is a (non-binary!) assignment; we call the expression II,[conv(¢))] a challenge. 
Observe that all assignments are chosen by Verifier. Prover answers with some 
k € F. We call the expression II, [conv(q)] = k a claim, or the answer to the 
challenge II, [conv (w)]. 

CPCERTIFY consists of an initialisation and a number of rounds, one for each 
descendant of conv(y). Rounds are executed in topological order, starting at the 
root, i.e. at conv(y) itself. The structure of a round for a node 4% of conv(y) 
depends on whether yw is an internal node (including the root), or a leaf. 

At each point, Verifier keeps track of a set C of claims that must be checked. 


Initialisation. Verifier sends Prover the challenge II,[conv(y)], where a(x) := 
271 for every x € free(y). Prover returns the claim II, [conv(y)] = K for some 
K € F. (By Lemma 2, this amounts to claiming that y has K - 2” satisfying 
assignments.) Verifier initialises C := {II,[conv(y)] = K}. 


Round for an Internal Node. A round for an internal node w runs as follows: 


(a) Verifier collects all claims {Is [Y] = ki}? in C relating to 7, with assign- 
ments 01,...,0m: free(q) — F and ky,...,km E€ F. (Initially Y = conv(y) 
and the only claim is I[,[conv(y)] = K.) 

(b) If m > 1, Verifier interacts with Prover to compute a unique claim I, fy] = 
k such that very likely? the claim is true only if all claims {Ho [wv] = ki}; 
are true. For this, Verifier sends a number of challenges, and checks that 
the answers are consistent with the prior claims. Based on these answers, 
Verifier then derives new claims. (See “Description of step (b)” below.) 

(c) Verifier interacts with Prover to compute a claim I,-[~"] = k’ for each child 
y’ of w. This is similar to (b): if Hoy] 4 k, i.e. the unique claim from (b) 
does not hold, then very likely one of the resulting claims will be wrong. 
Depending on the type of w, the claims are computed based on the answers 
of Prover to challenges sent by Verifier. (See “Description of step (c)” below.) 

(d) In total, Verifier removed the claims {II,, [Y] = &;}2, from C, and replaced 
them by one claim He [Y] = k’ for each child y’ of ẹ. 


Observe that, since a node w can be a child of several nodes, Verifier may collect 
multiple claims for Y, one for each parent node. 


Round for a Leaf. If y is a leaf, then Y = x for a variable x, or w € {true, false}. 
Verifier removes all claims {II,,[~] = k:}7, from C, computes the values c; := 
IMs ly], and rejects if k; A c; for any i. 


? The precise bound on the failure probability will be given in Proposition 2. 
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Observe that if all claims made by Prover about leaves are true, then very 
likely Prover’s initial claim is also true. 


Description of Step (b). Let {II,, [Y] = ki}, be the claims in C relating to 
node 7. Verifier and Prover conduct step (b) as follows: 


(b.1) While there exists x € X s.t. 01(x),...,0m(x) are not pairwise equal: 
(b.1.1) For every i € {1,...,m}, let of denote the partial assignment which is 
undefined on x and otherwise matches o;. Verifier sends the challenges 
{Me [Y]}; to Prover. Prover answers with claims {Io [Y] = pi}721- 
Note that p1,...,Dm are univariate polynomials with free variable zx. 
(b.1.2) Verifier checks whether ki = 7[z:=¢,(2)) pi holds for each i. If not, Ver- 
ifier rejects. Otherwise, Verifier picks r € F uniformly at random and 
updates o;(x) := r and ki := mz:=,)pi for every i € {1,...,m}. 
(b.2) If after exiting the loop the values ky, ..., km are not pairwise equal, Verifier 
rejects. Otherwise (that is, if ky = ko =--- = km), the set C now contains 
a unique claim II, [~] = k relating to w. 


Example 1. Consider the case in which X = {x}, and Prover has made two 
claims, He [y] = kı and Ho [Yy] = k2 with oi(x) = 1 and oa(x) = 2. In step 
(b.1.1) we have o| = 04 (both are the empty assignment), and so Verifier sends 
the challenge [y] to Prover twice, who answers with claims [y] = pı and fy] = 
p2. In step (b.1.2) Verifier checks that pi(1) = kı and p2(2) = kz hold, picks a 
random number r, and updates o1(%) := o9(x) := r and ky := pi (r), k2 := po(r). 
Now the condition of the while loop fails, so Verifier moves to (b.2) and checks 
ky = k2. 


Description of Step (c). Let I,[)] = k be the claim computed by Verifier in 
step (b). Verifier removes this claim from C and replaces it by claims about the 
children of Y, depending on the structure of w: 


(c.l) If Y = Yı ® Wo, for a ® € {V,A}, then Verifier sends Prover challenges 
IHs [y:] for i € {1,2}, and Prover sends claims II, [¢;] = k; back. Verifier 
checks the consistency condition k = m[z:=4,)T[y:=ko] l£ ® y], rejecting if it 
does not hold. If the condition holds, the claim IL, [1] = k; is added to C, 
to be checked in the round for y. 

(c.2) If y = ~y’, then Verifier adds the claim Hof] = 1 — k to w’. 

(c.3) If Y = Tg:=0] Y, Verifier sets o’ := o U {x +> b} and adds the claim 
Ilo ly] =k toC. 

(c.4) If Y = sY’, then Verifier sends Prover the challenge Ho fy], where 
o’ denotes the partial assignment which is undefined on x and other- 
wise matches ø. Prover returns the claim p := Ho [y]. Observe that p 
is a univariate polynomial over x. Verifier checks the consistency condi- 
tion T2:=0(x)|}4e p = k, rejecting if it does not hold. If it holds, Verifier 
picks an r € F uniformly at random, conducts the updates o(x) := r and 
k := Tiz:=r] P, and adds II, [~'] = k to the set of claims about y’. 
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This concludes the description of the interactive protocol. We now show 
CPCERTIFY is complete and sound. 


Proposition 2 (CPCERTIFY is complete and sound). Let p be a CP with 
n free variables. Let I1,]conv(y)] = K be the claim initially sent by Prover to 
Verifier. If the claim is true, then Prover has a strategy to make Verifier accept. 
If not, for every Prover, Verifier accepts with probability at most 4n|y|/|F|. 


If the original claim is correct, Prover can answer every challenge truthfully 
and all claims pass all of Verifier’s checks. So Verifier accepts. If the claim is not 
correct, we proceed round by round. We bound the probability that the Verifier 
is tricked in a single step to at most 2/|F| using the Schwartz-Zippel Lemma. 
We then bound the number of such steps to 2n|y| and use a union bound. 


5 A BDD-Based Prover 


We assume familiarity with reduced ordered binary decision diagrams (BDDs) 
[9]. We use BDDs over X = {21,...,2%n}. We fix the variable order zı < £2 < 
...< Xn, i.e. the root node would decide based on the value of xy. 


Definition 4. BDDs are defined inductively as follows: 


— (true) and (false) are BDDs of level 0; 

- ifu#v are BDDs of level lu, lu andi > bu, by, then (x;,u,v) is a BDD of 
level i; 

— we identify (x;,u,u) and u, fora BDD u of level 4; andi > ly. 


The level of a BDD w is denoted ((w). The set of descendants of w is the 
smallest set S with w € S and u,v € S for all (x,u,v) E€ S. The size |w| of w is 
the number of its descendants. 

The arithmetisation of a BDD w is the polynomial [w] defined as follows: 
[(true)] := 1, [(false)] := O and [(x,u,v)] := [1 — z] - [u] + [z] - [v]. 


Figure 3 shows a BDD for the boolean function (x,y,z) = (x A yA 72) V 
(az AyA z)V (a An7y Az) and the arithmetisation of each node. 


BDDSotver: A BDD-based Algorithm for #CP. An instance y of #CP 
can be solved using BDDs. Starting at the leaves of y, we iteratively compute 
a BDD for each node w of the circuit encoding the boolean predicate Py. At 
the end of this procedure we obtain a BDD for P,. The number of satisfying 
assignments of w is the number of accepting paths of the BDD, which can be 
computed in linear time in the size of the BDD. 

For a node w = Yı ® Y2, given BDDs representing the predicates Pp, and 
Pp, we compute a BDD for the predicate P, := P,, ® P,,, using the Apply, 


2) (2) 
operator on BDDs. We name this algorithm for solving #CP “BDDSOLVER.” 
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From BDDSOLVER to CPCERTIFY. Our 
goal is to modify BDDSOLVER to play the 
role of an honest Prover in CPCERTIFY with 
minimal overhead. In CPCERTIFY, Prover 
repeatedly performs the same task: evaluate 
polynomials of the form II,[2], where ~ is 
a descendant of the CPD conv(y), and ø 
assigns values to all free variables of ọ except 
possibly one. Therefore, the polynomials have 
at most one free variable and, as we have 
seen, degree at most 2. 

Before defining the concepts precisely, we 
give a brief overview of this section. 


— First (Proposition 3), we show that BDDs 
correspond to binary multilinear polyno- 
mials. In particular, BDDs allow for effi- 
cient evaluation of the polynomial. As 


[zy + yz+ 
zz — 3xyz] 


ly +z 


Zaye [yz] 


x 


p-al() OL 
5 


Fig. 3. A BDD and its arithmeti- 
sation. For (x, u,v), we denote the 
link from x to v with a solid edge 
and x to u with a dotted edge. We 
omit links to (false). 


argued in Lemma 1(a), for every descendant w of p, the CPD conv(~w) (which 
is a descendant of conv(y)) evaluates to a multilinear polynomial. In particu- 
lar, Prover can use standard BDD algorithms to calculate the corresponding 
polynomials II,[] for all descendants w of conv(y) that are neither binary 


operators nor degree reductions. 


— Second (the rest of the section), we prove a surprising connection: the inter- 
mediate results obtained while executing the BDD algorithms (with slight 
adaptations) correspond precisely to the remaining descendants of conv(y). 


The following proposition proves that BDDs represent exactly the binary 


multilinear polynomials. 


Proposition 3. (a) For a BDD w, |w] is a binary multilinear polynomial. (b) 
For a binary multilinear polynomial p there is a unique BDD w s.t. p = [w]. 


5.1 Extended BDDs 


During the execution of CPCERTIFY for a given CPD conv(y), Prover sends 
to Verifier claims of the form II, [7], where w is a descendant of conv(y), and 
o: X — F is a partial assignment. While all polynomials computed by CPCER- 
TIFY are binary, not all are multilinear: some polynomials have degree 2. For 
these polynomials, we introduce extended BDDs (eBDDs) and give eBDD-based 


algorithms for the following two tasks: 


1. Compute an eBDD representing [Y] for every node w of conv(y). 
2. Given an eBDD for |y] and a partial assignment o, compute Ie fy]. 
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Computing eBDDs for CPDs: Informal O -- Pn = Ôr, `- Ôe, PO 
Introduction. Consider a CP ọ and its asso- 


ciated CPD conv(y). Each node of y induces 

a chain of nodes in conv(y), consisting of 

degree-reduction nodes 6z,,...,62,,, followed O seprai 

by the node itself (see Fig. 4). Given BDDs PET SHB 
u and v for the children of the node in the 
CP, we can compute a BDD for the node 
itself using a well-known BDD algorithm 
Applye (u, v) parametric in the boolean oper- 
ation ® labelling the node [9]. Our goal 
is to transform Apply, into an algorithm 
that computes eBDDs for all nodes in the 
chain, i.e. eBDDs for all the polynomials Fig. 4. A node of a CP (®) gets a 
P0;P15--+;Pn of Fig. 4. chain of degree reduction nodes in 

Roughly speaking, Applyg(u,v) recur- the associated CPD. 
sively computes BDDs wo = Apply ẹ (uo, vo) 
and wı = Apply @(ui,v1), where up and vp are the b-children of u and v, and 
then returns the BDD with wọ and w as 0- and 1-child, respectively.’ 

Most importantly, we modify Apply, to run in breadth-first order. Figure 5 
shows a graphical representation of a run of Apply,(u,v), where u and v are the 
two BDD nodes labelled by x. Square nodes represent pending calls to Apply g. 
Initially there is only one square call Apply, (u, v) (Fig. 5, top left). Applyy calls 
itself recursively for uo, vo and w1,v, (Fig.5, top right). Each of the two calls 
splits again into two; however, the first three are identical (Fig.5, bottom left), 
and so reduce to two. These two calls can now be resolved directly; they return 
nodes true and false, respectively. At this point, the children of Apply (u, v) 
become (y,true,true) = true, and (y,true, false), which exists already as well 
(Fig. 5, bottom right). 

We look at the diagrams of Fig.5 not as a visualisation aid, but as graphs 
with two kinds of nodes: standard BDD nodes, represented as circles, and product 
nodes, represented as squares. We call them extended BDDs. Each node of an 
extended BDD is assigned a polynomial in the expected way: the polynomial 
[u] of a standard BDD node u with variable x is x - [ui] + (1 — x) - [uo], the 
polynomial [v] of a square A-node v is [vo] - [vi], etc. In this way we assign to 
each eBDD a polynomial. In particular, we obtain the intermediate polynomials 
Po, P1, P2, p3 of the figure, one for each level in the recursion. In the rest of the 
section we show that these are precisely the polynomials po, pi1,..., Dn of Fig. 4. 

Thus, in order to compute eBDDs for all nodes of a CPD conv(y), it suffices 
to compute BDDs for all nodes of the CP y. Since we need to do this anyway 
to solve #CP, the polynomial certification does not incur any overhead. 


3 In fact, this is only true when u and v are nodes at the same level and 
Apply (uo, vo) # Applyg(ui,v1), but at this point we only want to convey some 
intuition. 
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Fig. 5. Run of Apply, (u,v), but with recursive calls evaluated in breadth-first order. 
All missing edges go to node false. 


Extended BDDs. As for BDDs, we define eBDDs over X = {21,...,2,} with 
the variable order £1 < £2 < ... < En- 


Definition 5. Let ® be a binary boolean operator. The set of eBDDs (for ®) is 
inductively defined as follows: 


— every BDD is also an eBDD of the same level; 

- if u,v are BDDs (not eBDDs!), then (u ® v) is an eBDD of level l where 
l := max{L(u),4(v)}; we call eBDDs of this form product nodes; 

- ifu#v are eBDDs andi > €(u), (v), then (aj, u,v) is an eBDD of level i; 

— we identify (x;,u,u) and u for an eBDD u andi > e(u). 


The set of descendants of an eBDD w is the smallest set S with w € S and 
u,v E S for all (u ® v), (x,u,v) E€ S The size of w is its number of descendants. 
For u,v € {(true) , (false)} we identify (u@®v) with (true) or (false) according 
to the result of ®, e.g. ((true) V (false)) = (true), as true V false = true. The 
arithmetisation of an eBDD for a boolean operator ® € {A, V} is defined as for 
BDDs, with the extensions [(u A v)] = [u]: [v] and [(u v v)] = [u]+[e]—[e)- [v]. 


Example 2. The diagrams in Fig.5 are eBDDs for ® := V. Nodes of the form 
(x,u,v) and (uV v} are represented as circles and squares, respectively. Con- 
sider the top-left diagram. Abbreviating x @ y := (a A ay) V (ax A y) we get 
[Apply (u, v)] = [2 y) A (Ay) = fe Oy] leny] = E0- y) + C- z): 
y — xy(1 — x)(1 — y))- xy, which is the polynomial pp shown in the figure. 
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Table 1. On the left: Algorithm computing eBDDs for the sequence |w], 452, [wv], 
Ox, —1Oan [w], ---, Ox, +++ 5x, [wW] of polynomials. On the right: Recursive algorithm to 
evaluate the polynomial represented by an eBDD at a given partial assignment. P(w) 
is a mapping used to memoize the polynomials returned by recursive calls. 


ComputrEEBDD(w) EVALUATEEBDD(w,¢) =: Es (w) 
Input: eBDD w Input: eBDD w; assignment o: X — F 
Output: sequence wo, ...,Wn of eBDDs Output: II, |w] 
Wo := w; output wo if P(w) is defined return P(w) 
for i=0,--- ,£(w) — 1 do if w € {(true) , (false)} return [w] 
Wii = Wi if w = (u ^A v) 
for every node (u ® v) of wi P(w) := Es (u) - Eo(v) 
at level n — i do if w = (u V v) 
for b € {0,1} do P(w) := Es (u) + Es (v) — Es (u)Eo (v) 
Ub := Tjen _;:=b] U if w = (x,u, v) and o(x) undefined 
Ub = Tay i:=b] V P(w) := [1 — z] - Eo (u) + [x] - Eo (v) 
to := (ub ® vo) if w = (x,u, v) and a(x) = s € F 
Wi41 = wisi | (u ® v) / (En—i, to, t1) | P(w) := [1 — s] - Eo (u) + [s] : Eo (v) 
output wi+1 return P(w) 


Computing eBDDs for CPDs. Given a node of a CP corresponding to a 
binary operator ®, Prover has to compute polynomials po, xı Po, ---) Oa, -Ôx PO 
corresponding to the nodes of the CPD shown on the right. We show that 
Prover can compute these polynomials by representing them as eBDDs. Table 1 
describes an algorithm that gets as input an eBDD w of level n, and outputs a 
sequence Wo, W1,.--;Wn+1 Of eBDDs such that wo = w; [wisi] = bz,_,[wi] for 
every 0 < i < &(w) — 1; and wn+1ı is a BDD. Interpreted as sequence of eBDDs, 
Fig. 5 shows a run of this algorithm. 


Notation. Given an eBDD w and eBDDs u, v such that (u) > (v), we let w[u/v] 
denote the result of replacing u by v in w. For an eBDD w = (a;, wọ, w1) and 
b € {0,1} we define Tiz, =w := wp, and for j > i we set Tig :=]W := w. (Note 
that [Tiz =o] w] = Tiz; := [w] holds for any j where it is defined.) 


Proposition 4. Let pı, Y2 denote CPs and u1,u2 BDDs with [ui] = [Wi], i € 
{1,2}. Let w := (u1 ® u2) denote an eBDD. Then COMPUTEEBDD (w) satisfies 
[wo] = [41 ® Y2] and [wi41] = ôz„_ [wi] for every 0 < i < n—1; moreover, wy 
is a BDD with wn = Applyẹ (u1, u2). Finally, the algorithm runs in time O(T), 
where T € O(|u1| - |u2|) is the time taken by Apply (u1, u2). 


Evaluating Polynomials Represented as eBDDs. Recall that Prover must 
evaluate expressions of the form Ie [y] for some CPD yY, where o assigns values 
to all variables of ~ except for possibly one. We give an algorithm to evaluate 
arbitrary expressions II,[w], where w is an eBDD, and show that if there is 
at most one free variable then the algorithm takes linear time in the size of w. 
The algorithm is shown on the right of Table 1. It has the standard structure of 
BDD procedures: It recurs on the structure of the eBDD, memoizing the result 
of recursive calls so that the algorithm is called at most once with a given input. 
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Proposition 5. Let w denote an eBDD, 0: X — F a partial assignment, and 
k the number of variables assigned by o. Then EVALUATEEBDD evaluates the 
polynomial I1,[w] in time O(poly(2”~*) - |w]). 


5.2 Efficient Certification 


In the CPCERTIFY algorithm, Prover must (a) compute polynomials for all 
nodes of the CPD, and (b) evaluate them on assignments chosen by Verifier. 
In the last section we have seen that COMPUTEEBDD (for binary operations 
of the CP), combined with standard BDD algorithms (for all other operations), 
yields eBDDs representing all these polynomials—at no additional overhead, 
compared to a BDD-based implementation. This covers part (a). Regarding (b), 
recall that all polynomials computed in (a) have at most one variable. Therefore, 
using EVALUATEEBDD we can evaluate a polynomial in linear time in the size 
of the eBDD representing it. 

The Verifier CPCERTIFY is implemented in a straightforward manner. As the 
algorithm runs in polynomial size w.r.t. the CP (and not the computed BDDs, 
which may be exponentially larger), incurring overhead is less of a concern. 


Theorem 1 (Main Result). Jf BDDSOLVER solves an instance p of #CP 
with n variables in time T, with T > nl|y|, then 


(a) Prover computes eBDDs for all nodes of conv(y) in time O(T), 

(b) Prover responds to Verifier’s challenges in time O(nT), and 

(c) Verifier executes CPCERTIFY in time O(n?|p|), with failure probability at 
most 4n|y|/|F|. 


As presented above, EVALUATEEBDD incurs a factor-of-n overhead, as every 
node of the CPD must be evaluated. In our implementation, we use a caching 
strategy to reduce the complexity of Theorem 1(b) to O(T). 

Note that the bounds above assume a uniform cost model. In particular, 
operations on BDD nodes and finite field arithmetic are assumed to be O(1). 
This is a reasonable assumption, as for a constant failure probability log |F| ~ 
log n. Hence the finite field remains small. (It is possible to verify the number of 
assignments even if it exceeds |F|, see below.) 


5.3 Implementation Concerns 


We list a number of points that are not described in detail in this paper, but 
need to be considered for an efficient implementation. 


Finite Field Arithmetic. It is not necessary to use large finite fields. In par- 
ticular, one can avoid the overhead of arbitrarily sized integers. For our imple- 
mentation we fix the finite field F := Zp, with p = 2°! — 1 (the largest Mersenne 
prime to fit in 64 bits). 


Incremental eBDD Representation. Algorithm COMPUTEEBDD computes 
a sequence of eBDDs. These must not be stored explicitly, otherwise one incurs 
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a space-overhead. Instead, we only store the last eBDD as well as the differences 
between each subsequent element of the sequence. To evaluate the eBDDs, we 
then revert to a previous state by applying the differences appropriately. 


Evaluation Order. It simplifies the implementation if Prover only needs to 
evaluate nodes of the CPD in some (fixed) topological order. CPCERTIFY can 
easily be adapted to guarantee this, by picking the next node appropriately in 
each iteration, and by evaluating only one child of a binary operator 1 ® qe. 
The value of the other child can then be derived by solving a linear equation. 


Efficient Evaluation. As stated in Theorem 1, using EVALUATEEBDD Prover 
needs N(nT) time to respond to Verifier’s challenges. In our implementation 
we instead use a caching strategy that reduces this time to O(T). Essentially, 
we exploit the special structure of conv(y): Verifier sends a sequence of chal- 
lenges ,,6z,---0x, W, Ilo, Ôx., W, -Ho w, where assignments o; and Gipi 
differ only in variables x; and zi+1. The corresponding eBDDs likewise change 
only at levels 1 and 1+ 1. We cache the linear coefficients of eBDD nodes that 
contribute to the arithmetisation of the root top-down, and the arithmetised 
values of nodes bottom up. As a result, only levels 7,i + 1 need to be updated. 


Large Numbers of Assignments. If the number of satisfying assignments 
of a CP exceeds |F|, Verifier would not be able to verify the count accurately. 
Instead of choosing |F| > 2”, which incurs a significant overhead, Verifier can 
query the precise number of assignments, and then choose |F| randomly. This 
introduces another possibility of failure, but (roughly speaking) it suffices to 
double log |F| for the additional failure probability to match the existing one. 
Our implementation does not currently support this technique. 


6 Evaluation 


We have implemented an eBDD library, blic (BDD Library with Interactive Cer- 
tification)’, that is a stand-in replacement for BDDs but additionally performs 
the role of Prover in the CPCERTIFY protocol. We have also implemented a 
client that executes the protocol as Verifier. The eBDD library is about 900 
lines of C++ code and the CPCERTIFY protocol is about 400 lines. We have 
built a prototype certifying QBF solver in blic, totalling about 2600 lines of code. 
We aim to answer the following questions in our evaluation: 


RQ1. Is a QBF solver with CPCERTIFY-based certification competitive? If so, 
how high is the overhead of implementing CPCERTIFY on top of the 
BDD operations? 

RQ2. What is the amount of communication for Prover and Verifier in execut- 
ing the CPCERTIFyY protocol, what is the time requirement for Verifier, 
and how do these numbers compare to proof sizes and proof checking 
times for certificates based on resolution and other proof systems? 


4 https: //gitlab.lrz.de/i7/blic. 
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Fig. 6. (a) Time taken on instances (dashed lines are y = 100x and y = 0.012), (b) 
Cost of generating a certificate over computing the solution, (c) Time to verify the 
certificate, (d) Size of certificates 


RQ1: Performance of blic. We compare blic with CAQE, DepQBF, and PGB- 
DDQ, three state-of-the-art QBF solvers. CAQE [10,29] does not provide any 
certificates in its most recent version. DepQBF [12,19] is a certifying QBF solver. 
PGBDDQ |7, 25] is an independent implementation of a BDD-based QBF solver. 
Both DepQBF and PGBDDQ provide specialised checkers for their certificates, 
though PGBDDQ can also proofs in standard QRAT format. Note that PGBDDQ 
is written in Python and generates proofs in an ASCII-based format, incurring 
overhead compared to the other tools. 

We take 172 QBF instances (all unsatisfiable) from the Crafted Instances 
track of the QBF Evaluation 2022.5 The Prener CNF track of the QBF com- 
petition is not evaluated here. It features instances with a large number of vari- 
ables. BDD-based solvers perform poorly under these circumstances without 
additional optimisations. Our overall goal is not to propose a new approach for 


5 CAQE and DepQBF were the winner and runner-up in this category. The configu- 
ration we used differs from the competition, as described in the full version of the 
paper [11]. 
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Table 2. Comparison of certificate generation, bytes exchanged between prover and 
verifier, and time taken to verify the certificate on a set of QBF benchmarks from 
[7]. “Solve time” is time taken to solve the instance and to generate a certificate (sec- 
onds), “Certificate” is the size of proof encoding for PGBDDQ, and bytes exchanged by 
CPCeERrTIFY for blic, and “Verifier time” is time to verify the certificate (Verifier’s run 
time for blic and time taken by qchecker). 


Instance | Solve time (s) | Certificate (MiB) | Verifier time (s) 
n | result | blic PGBDDQ | blic PGBDDQ | blic | qchecker 
10 | sat 0.03 | 3.67 1.20 8.48 0.01 | 3.80 

10 | unsat | 0.03 | 3.66 1.20 8.45 0.01 | 3.83 

15 | sat 0.13 | 18.07 4.12 44.25 0.02 | 18.45 

15 | unsat |0.13 | 18.14 4.11 44.20 0.02 | 18.55 
20 | sat 0.54 | 82.92 11.59 | 198.54 0.07 | 80.28 
20 | unsat | 0.53 | 83.02 11.64 | 198.76 0.06 | 79.05 

25 | sat 1.56 | 261.16 23.94 | 566.95 0.14 | 238.99 
25 | unsat | 1.55 | 261.25 23.86 | 565.36 0.15 | 237.94 
40 | sat 25.22 | 4863.71 | 132.43 | 7464.96 | 0.95 | 5141.08 
40 | unsat | 25.25 | 4827.06 | 132.67 | 7467.84 | 0.99 | 5463.54 


solving QBF, but rather to certify a BDD-based approach, so we wanted to focus 
on cases where the existing BDD-based approaches are practical. 

We ran each benchmark with a 10 min timeout; all tools other than CAQE 
were run with certificate production. All times were obtained on a machine 
with an Intel Xeon E7-8857 CPU and 1.58 TiB RAM® running Linux. See the 
full version of the paper [11] for a detailed description. blic solved 96 out of 
172 benchmarks, CAQE solved 98, DepQBF solved 87, and PGBDDQ solved 91. 
Figure 6(a) shows the run times of blic compared to the other tools. The plot 
indicates that blic is competitive on these instances, with a few cases, mostly 
from the Lonsing family of benchmarks, where blic is slower than DepQBF by 
an order of magnitude. Figure 6(b) shows the overhead of certification: for each 
benchmark (that finishes within a 10min timeout), we plot the ratio of the time to 
compute the answer to the time it takes to run Prover in CPCERTIFY. The dotted 
regression line shows CPCERTIFY has a 2.8x overhead over computing BDDs. 
For this set of examples, the error probability never exceeds 10789 (107116 
when Lonsing examples are excluded); running the verifier k times reduces it to 
1078-9k, 


RQ2: Communication Cost of Certification and Verifier Time. We 
explore RQ2 by comparing the number of bytes exchanged between Prover and 
Verifier and the time needed for Verifier to execute CP CERTIFY with the number 
of bytes in an QBF proof and the time required to verify the proof produced by 
DepQBF and PGBDDQ, for which we use QRPcheck [24,26] and qchecker [7,25], 
respectively. Note that the latter is written in Python. 


6 blic uses at most 60 GiB on the shown benchmarks, 5 GiB when excluding timeouts. 
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We show that the overhead of certification is low. Figure 6(c) shows the run 
time of Verifier—this is generally negligible for blic, except for the Lonsing and 
KBKF families, which have a large number of variables, but very small BDDs. 
Figure 6(d) shows the total number of bytes exchanged between Prover and 
Verifier in blic against the size of the proofs generated by PGBDDQ and DepQBF. 
For large instances, the number of bytes exchanged in blic is significantly smaller 
than the size of the proofs. The exception are again the Lonsing and KBKNF 
families of instances. For both plots, the dotted line results from a log-linear 
regression. 

In addition to the Crafted Instances, we compare against PGBDDQ on a 
challenging family of benchmarks used in the PGBDDQ paper (matching the 
parameters of |7, Table 3]); these are QBF encodings of a linear domino placing 
game.’ Our results are summarised in Table2. The upper bound on Verifier 
error is 10~°:?2. We show that blic outperforms PGBDDQ both in overall cost 
of computing the answer and the certificates as well as in the number of bytes 
communicated and the time used by Verifier. 

Our results indicate that giving up absolute certainty through interactive 
protocols can lead to an order of magnitude smaller communication cost and 
several orders of magnitude smaller checking costs for the verifier. 


7 Conclusion 


We have presented a solver that combines BDDs with an interactive protocol. 
blic can be seen as a self-certifying BDD library able to certify the correctness of 
arbitrary sequences of BDD operations. In order to trust the result, a user must 
only trust the verifier (a straightforward program that poses challenges to the 
prover). We have shown that blic (including certification time) is competitive 
with other solvers, and Verifier’s time and error probabilities are negligible. 

Our results show that IP = PSPACE can become an important result not only 
in theory but also in the practice of automatic verification. From this perspec- 
tive, our paper is a first step towards practical certification based on interactive 
protocols. While we have focused on BDDs, we can ask the more general ques- 
tion: which practical automated reasoning algorithms can be made efficiently 
certifying? For example, whether there is an interactive protocol and an effi- 
cient certifying version of modern SAT solving algorithms is an interesting open 
challenge. 
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Abstract. Dubbed a safer C, Rust is a modern programming language 
that combines memory safety and low-level control. This interesting com- 
bination has made Rust very popular among developers and there is a 
growing trend of migrating legacy codebases (very often in C) to Rust. In 
this paper, we present a C to Rust translation approach centred around 
static ownership analysis. We design a suite of analyses that infer owner- 
ship models of C pointers and automatically translate the pointers into 
safe Rust equivalents. The resulting tool, CROWN, scales to real-world 
codebases (half a million lines of code in less than 10s) and achieves a 
high conversion rate. 


1 Introduction 


Rust [33] is a modern programming language which features an exciting combi- 
nation of memory safety and low-level control. In particular, Rust takes inspi- 
ration from ownership types to restrict the mutation of shared state. The Rust 
compiler is able to statically verify the corresponding ownership constraints and 
consequently guarantee memory and thread safety. This distinctive advantage 
of provable safety makes Rust a very popular language, and the prospect of 
migrating legacy codebases in C to Rust is very appealing. 

In response to this demand, automated tools translating C code to Rust 
emerge from both industry and academia [17, 26,31]. Among them, the industrial 
strength translator C2Rust [26] rewrites C code into the Rust syntax while pre- 
serving the original semantics. The translation does not synthesise an ownership 
model and thus is not able to do more than replicating the unsafe use of pointers 
in C. Consequently, the Rust code must be labelled with the unsafe keyword 
which allows certain actions that are not checked by the compiler. More recent 
work focuses on reducing this unsafe labelling. In particular, the tool Laertes [17] 
aims to rewrite the (unsafe) code produced by C2Rust by searching the solu- 
tion space guided by the type error messages from the Rust compiler. This is 
impressive, as for the first time proper Rust code beyond a line-by-line direct 
conversion from the original C source may be synthesised. On the other hand, 
the limit of the trial-and-error approach is also clear: the system does not sup- 
port the reasoning of the generation process, nor create any new understanding 
of the target code (other than the fact that it compiles successfully). 
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In this paper, we take a more principled approach by developing a novel 
ownership analysis of pointers that is efficient (scaling to large programs (half 
a million LOC in less than 10s)), sophisticated (handling nested pointers and 
inductively-defined data structures), and precise (being field and flow sensitive). 
Our ownership analysis makes a strengthening assumption about the Rust own- 
ership model, which obviates the need for an aliasing analysis. While this assump- 
tion excludes a few safe Rust uses (see discussion in Sect.5), it ensures that the 
ownership analysis is both scalable and precise, which is subsequently reflected 
in the overall scalability and precision of the C to Rust translation. 

The primary goal of this analysis is of course to facilitate the C to Rust 
translation. Indeed, as we will see in the rest of the paper, an automated trans- 
lation system is built to encode the ownership models in the generated Rust 
code which is then proven safe by the Rust compiler. However, in contrast to 
trying the Rust compiler as common in existing approaches [17,31], this analy- 
sis approach actually extracts new knowledge about ownership from code, which 
may lead to other future utilities including preventing memory leaks (currently 
allowed in safe Rust), identifying inherently unsafe code fragments, and so on. 
Our current contributions are: 


— design a scalable and precise ownership analysis that is able to handle complex 
inductively-defined data structures and nested pointers. (Section 5) 

— develop a refactoring technique for Rust leveraging ownership analyses to 
enhance code safety. While in this paper we focus on applying our technique 
to the translation from C to Rust, it can be used to improve the safety of any 
unsafe Rust code. (Section 6) 

— implement a prototype tool (CROWN, standing for C to Rust OWNership 
guided translation) that translates C code into Rust with enhanced safety. 
(Section 7) 

— evaluate CROWN with a benchmark suite including commonly used data 
structure libraries and real-world projects (ranging from 150 to half a million 
LOC) and compare the result with the state-of-the-art. (Section 8) 


2 Background 


We start by giving a brief introduction of Rust, in particular its ownership system 
and the use of pointers, as they are central to memory safety. 


2.1 Rust Ownership Model 


Ownership in Rust denotes a set of rules that govern how the Rust compiler 
manages memory [33]. The idea is to associate each value with a unique owner. 
This feature is useful for memory management. For example, when the owner 
goes out of scope, the memory allocated for the value can be automatically 
recycled. 
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1 let mut v 


2 let mut u = v; // ownership is transferred to u 
In the above snippet, the assignment of v to u also transfers ownership, after 
which it is illegal to access v until it is re-assigned a value again. 

This permanent transfer of ownership gives strong guarantees but can be 
cumbersome to manage in programming. In order to allow sharing of values 
between different parts of the program, Rust uses the concept of borrowing, 
which refers to creating a reference (marked by an ampersand). A reference 
allows referring to some value without taking ownership of it. Borrowing gives 
the temporary right to read and, potentially, uniquely mutate the referenced 
value. 

This concept of time creates another dimension of ownership management 
known as lifetime. For mutable references (as marked by mut in the above exam- 
ples), only one reference is allowed at a time. But for immutable references (the 
ones without the mut marking), multiple of them can coexist as long as there 
isn’t any mutable reference at the same time. As one can expect, this interaction 
of mutable and immutable references, and their lifetimes is highly non-trivial. In 
this paper, we focus on analysing mutable references. 


2.2 Pointer Types in Rust 


Rust has a richer pointer system than C. The primitive C-style pointers (written 
as *const T or *mut T) are known as raw pointers, which are ignored by the 
Rust compiler for ownership and lifetime checks. Raw pointers are a major source 
of unsafe Rust (more below). Idiomatic Rust instead advocates box pointers 
(written as Box<T>) as owning pointers that uniquely own heap allocations, 
as well as references (written as &mut T or & T as discussed in the previous 
subsection) as non-owning pointers that are used to access values owned by 
others. Rust also offers smart pointers for which the borrow rules are checked 
at runtime (e.g. RefCell<T>). We aim for our translation to maintain CPU 
time without additional runtime overhead, and therefore we do not refactor raw 
pointers into RefCe11<T>s. 

C-style array pointers are represented in Rust as references to arrays and slice 
references, with array bounds known at compile time and runtime, respectively. 
The creation of meta-data such as array bounds is beyond the scope of ownership 
analysis. In this work, we keep array pointers as raw pointers in the translated 
code. 


2.3 Unsafe Rust 


As a pragmatic design, Rust allows programs to contain features that cannot 
be verified by the compiler as memory safe. This includes dereferencing raw 
pointers, calling low level functions, and so on. Such uses must be marked with 


462 H. Zhang et al. 


the unsafe keyword and form fragments of unsafe Rust. It is worth noting that 
unsafe does not turn off all compiler checks; safe pointers are still checked. 

Unsafe Rust is often used to implement data structures with complex shar- 
ing, overcome incompleteness issues of the Rust compiler, and support low-level 
systems programming [2,18]. But it can also be used for other reasons. For exam- 
ple, c2rust [26] directly translates C pointers into raw pointers. Without unsafe 
Rust, the generated code would not compile. 


3 Overview 


In this section, we present an overview of CROWN via two examples. The first 
example provides a detailed description of the push method for a singly-linked 
list, whereas the second shows a snippet from a real-world benchmark. 


1 struct Node { 1 #[repr(C)] 1 #(repr(c)] 
z int data; 2 #[derive(Copy, Clone)] 2 pub struct Node { 
3 struct Node * next; 3 pub struct Node { 3 pub data: i32, 
A}; 4 pub data: i32, 1 pub next: Option<Box<Node>>, 
5 pub next: *mut Node, 5 } 
6 struct List { 6 } 6 
7 Node * head; 7 7 #[repr(c)] 
8 }; 8 #[repr(C)] 8 pub struct List { 
9 9 #[derive(Copy, Clone)] 9 pub head: Option<Box<Node>>, 
10 void push(struct List* list, int 10 pub struct List { 10 } 
new_data) { 11 pub head: *mut Node, 11 
11 struct Node* new_node = (struct 12 } 12 pub unsafe extern "C" fn push(mut 
Node*) malloc (sizeof (struct 13 list: Option<&mut List>, mut 
Node)); 14 pub unsafe extern "C" fn push(mut new_data: i32) { 
12 new_node->data = new_data; list: *mut List, mut 15 let mut new_node = Some(Box: :new 
13 new_node->next = list->head; new_data: i32) { (<Node as Default>::default 
14 list->head = new_node; 15 let mut new_node = malloc(::std Oy); 
15 } :imem::size_of::<Node>() as 14 (*new_node.as_deref_mut() .unwrap 
16 libc::c_ulong) as *mut Node; ().data = new_data; 
16 (#new_node) .data = new_data; 15 (+#new_node.as_deref_mut() . unwrap 
17 (*new_node) .next = (*list) head; Q).next = (*list. 
18 (*list) .head = new_node; as_deref_mut() .unwrap()). 
(a) C code 19 + head.take() ; 
20 16 (*list.as_deref_mut().unwrap()). 


head = new_node; 
(b) c2rust result 


(c) CROWN result 


Fig. 1. Pushing into a singly-linked list 


3.1 Pushing into a Singly-Linked List 


The C code of function push in Fig. la allocates a new node where it stores the 
data received as argument. The new node subsequently becomes the head of list. 
This code is translated by c2rust to the Rust code in Fig. 1b. Notably, the c2rust 

translation is syntax-based and simply changes all the C pointers to «mut raw 
pointers. Given that dereferencing raw pointers is considered an unsafe operation 
in Rust (e.g. the dereferencing of new_node at line 16 in Fig. 1b), the push method 
must be annotated with the unsafe keyword (alternatively, it could be placed 
inside an unsafe block). Additionally, c2rust introduces two directives for the two 
struct definitions, #[repr(C)] and #[derive(Copy, Clone) ]. The former keeps 
the data layout the same as in C for possible interoperation, and the latter instructs 
that the corresponding type can only be duplicated through copying. 
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While c2rust uses raw pointers in the translation, the ownership scheme in 
Fig. 1b obeys the Rust ownership model, meaning that the raw pointers could be 
translated to safe ones. A pointer to a newly allocated node is assigned to new_node 
at line 15. This allows us to infer that the ownership of the newly allocated node 
belongs to new_node. Then, at line 18, the ownership is transferred from new_node 
to (*list) .head. Additionally, if (+list) .head owns any memory object prior 
to line 17, then its ownership is transferred to (*new_node) .next at line 17. This 
ownership scheme corresponds to safe pointer use: (i) each memory object is asso- 
ciated with a unique owner and (ii) it is dropped when its owner goes out of scope. 
As an illustration for (i), when the ownership of the newly allocated memory is 
transferred from new_node to (*list) .head at line 18, (*list) .head becomes 
the unique owner, whereas new_node is made invalid and it is no longer used. For 
(ii), given that argument list of push is an output parameter (i.e. a parameter 
that can be accessed from outside the function), we assume that it must be owning 
on exit from the method. Thus, no memory object is dropped in the push method, 
but rather returned to the caller. 

CROWN infers the ownership information of the code translated by c2rust, 
and uses it to translate the code to safer Rust in Fig. 1c. As explained next, 
CROWN first retypes raw pointers into safe pointers based on the ownership 
information, and then rewrites their uses. 


Retyping Pointers in Crown. If a pointer owns a memory object at any point 
within its scope, CROWN retypes it into a Box pointer. For instance, in Fig. 1c, 
local variable new_node is retyped to be Option<Box<Node>> (safe pointer types 
are wrapped into Option to account for null pointer values). Variable new_node 
is non-owning upon function entry, becomes owning at line 13 and ownership is 
transferred out again at line 16. 

For struct fields, CROWN considers all the code in the scope of the struct 
declaration. If a struct field owns a memory object at any point within the scope 
of its struct declaration, then it is retyped to Box. In Fig. 1b, fields next and 
head are accessed via access paths (*new_node) .next and (*list).head, and 
given ownership at lines 17 and 18, respectively. Consequently, they are retyped 
to Box at lines 4 and 9 in Fig. 1c, respectively. 

A special case is that of output parameters, e.g. list in our example. For 
such parameters, although they may be owning, CROWN retypes them to &mut 
in order to enable borrowing. In push, the input argument list is retyped to 
Option<&mut List>. 


Rewriting Pointer Uses in Crown. After retyping pointers, CROWN rewrites 
their uses. The rewrite process takes into consideration both their new type and 
the context in which they are being used. Due to the Rust semantics, the rewrite 
rules are slightly intricate (see Sect. 6). For instance, the dereference of new_node 
at line 14 is rewritten to (*new_node) .as_deref_mut() .unwrap() as it needs to 
be mutated and the optional part of the Box needs to be unwrapped. Similarly, 
at line 15, (*list) .head is rewritten to be ((*list.as_deref_mut()).unwrap 
()) .head.take() as the LHS of the assignment expects a Box pointer. 
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After the rewrite performed by CROWN, the unsafe block annotation is not 
needed anymore. However, CROWN does not attempt to remove such annota- 
tions. Notably, safe pointers are always checked by the Rust compiler, even 
inside unsafe blocks. 


3.2 Freeing an Argument List in bzip2 


We next show the transformation of a real-world code snippet with a loop struc- 
ture: a piece of code in bzip2 that frees argument lists. bzip2 defines a singly- 
linked list like structure, Cell, that holds a list of argument names. In Fig. 2, 
we extract from the source code a snippet that frees the argument lists. Here, 
the local variable argList is an already constructed argument list, and Char is 
a type alias to C-style characters. As a note, Cell in Figs. 2b and 2c does not 
refer to Rust’s std::cell::Cell. 


1 typedef 
2 struct zzzz { 

Char ‘name; 
4 struct zzzz *link; 


5 } 

6 Cell; 

7 (...) 

8 Cell* aa = argList; 

9 while (aa != NULL) { 

10 Cell* aa2 = aa->link; 


#([derive(Copy, Clone)] 
#([repr(C)] 
pub struct zzzz { 
pub name: *mut Char, 
pub link: *mut zzzz, 
} 
pub type Cell = zzzz; 
es 
let mut aa: *mut Cell = argList; 
while !aa.is_null() { 


1 
2 


#[repr(C)] 
pub struct zzzz { 
pub name: *mut /* Char 


pub link: Option<Box<zzzz>>, 
+ 
pub type Cell = zzzz; 
ere 
let mut aa: Option<Box<Cell>> = 
argList; 


11 if (aa->name) 11 let mut aa2 = (*aa).link; 9 while !aa.as_deref().is_none() { 
12 free(aa->name) ; 12 if !(*aa).name.is_null() { 10 let mut aa2 = (*aa.as_deref_mut 
13 free(aa); 13 free((*aa).name as *mut libc © .unwrap()).link.take(); 
14 aa = aa2; :1e_void); 11 if !(*aa.as_deref().unwrap()). 
15 } 14 } name.is_null { 
16 i] 15 free(aa as *mut libc::c_void); 12 free((*aa.as_deref() .unwrap 
17 16 aa = aa2; ©).name as *mut libe:: 
17 } c_void); 
was 18 Gad 13 } 
(a) C definition i ch. kenai 
15 } 
16 [...] 
(b) c2rust result ae 


(c) CROWN result 


Fig. 2. Freeing an argument list 


CROWN accurately infers an ownership scheme for this snippet. Firstly, own- 
ership of argList is transferred to aa, which is to be freed in the subsequent 
loop. Inside the loop, ownership of link accessed from aa is firstly transferred 
to aa2, then ownership of name accessed from aa is released in a call to free. 
After the conditional, ownership of aa is also released. Last of all, aa regains 
ownership from aa2. 


Handling of Loops. For loops, CROWN only analyses their body once as that 
will already expose all the ownership information. For inductively defined data 
structures such as Cell, while further unrolling of loop bodies explores the data 
structures deeper, it does not expose any new struct fields: pointer variables and 
pointer struct fields do not change ownership between loop iterations. Addition- 
ally, CROWN emits constraints that equate the ownership of all local pointers at 
the loop entry and exit. For example, the ownership statuses of aa and aa2 at 
loop entry are made equal with those at loop exit, and inferred to be owning 
and non-owning, respectively. 


Ownership Guided C to Rust Translation 465 


Handling of Null Pointers. It is a common C idiom for pointers to be 
checked against null after malloc or before free: if !p.is_null() free(p) 
;. This could be problematic since the then-branch and the else-branch would 
have conflicting ownership statuses for p. We adopt a similar solution as [24]: we 
insert an explicit null assignment in the null branch if !p.is_null() free 
(p); else p = ptr::null_mut();. As we treat null pointers as both owning 
and non-owning, the ownership of p will be dictated by the non-null branch, 
enabling CROWN to infer the correct ownership scheme. 


Translation. With the above ownership scheme, CROWN performs the rewrites 
as in Fig. 2c. Note that we do not attempt to rewrite name since it is an array 
pointer (see Sect. 7 for limitations). 


4 Architecture 


In this section, we give a brief overview of CROWN’s architecture. CROWN takes 
as input a Rust program with unsafe blocks, and outputs a safer Rust program, 
where a portion of the raw pointers have been retyped as safe ones (in accordance 
to the Rust ownership model), and their uses modified accordingly. In this paper 
we focus on applying our technique to programs automatically translated by 
c2rust, which maintain a high degree of similarity to the original C ones, where 
the C syntax is replaced by Rust syntax. 

CROWN applies several static analyses on the MIR of Rust to infer properties 
of pointers: 


— Ownership analysis: computes ownership information about the pointers 
in the code, i.e. for each pointer it infers whether it is owning/non-owning at 
particular program locations. 

— Mutability analysis: infers which pointers are used to modify the object 
they point to (inspired by [22,25]). 

— Fatness analysis: distinguishes array pointers from non-array pointers 
(inspired by [32]). 


The results of these analyses are summarised as type qualifiers [21]. A type 
qualifier is an atomic property (i.e., ownership, mutability, and fatness) that 
‘qualifies’ the standard pointer type. These qualifiers are then utilised for pointer 
retyping. For example, an owning, non-array pointer is retyped to Box . After 
pointers have been retyped, CROWN rewrites their usages accordingly. 


5 Ownership Analysis 


The goal of our ownership analysis is to compute an ownership scheme for a 
given program that obeys the Rust ownership model, if such a scheme exists. The 
ownership scheme contains information about whether pointers in the program 
are owning or non-owning at particular program locations. At a high-level, our 
analysis works by generating a set of ownership constraints (Sect. 5.2), which are 
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then solved by a SAT solver (Sect. 5.3). A satisfying assignment for the ownership 
constraints is an ownership scheme that obeys the Rust semantics. 

Our ownership analysis is flow and field sensitive, where the latter enables 
inferring ownership information for pointer struct fields. To satisfy field sensitiv- 
ity, we track ownership information for access paths [10,14,29]. An access path 
represents a memory location by the way it is accessed from an initial, base 
variable, and comprises of the base variable and a sequence of field selection 
operators. For the program Fig. 1b, some example access paths are new_node 

(consists only of the base variable), (*new_node) .next, and (*list) .head. 
Our analysis associates an ownership variable with each access path, e.g. p has 
associated ownership variable O,,, and (*p) .next has associated ownership vari- 
able O(«p).neat- Each ownership variable can take the value 1 if the corresponding 
access path is owning, or 0 if it is non-owning. By ownership of an access path 
we mean the ownership of the field (or, more generally, pointer) accessed last 
through the access path, e.g. the ownership of (*new_node) .next refers to the 
ownership of field next. 


5.1 Ownership and Aliasing 


One of the main challenges of designing an ownership analysis is the interaction 
between ownership and aliasing. To understand the problem, let us consider 
the pointer assignment at line 3 in the code listing below. We assume that the 
lines before the assignment allow inferring that q, (*q) .next and r are owning, 
whereas p and (*p) .next are non-owning. Additionally, we assume that the lines 
after the assignment require (*p) .next to be owning (e.g. (*p) .next is being 
explicitly freed). From this, an ownership analysis could reasonably conclude that 
ownership transfer happens at line 3 (such that (*p) .next becomes owning), and 
the inferred ownership scheme obeys the Rust semantics. 


let p, r, q : *mut Node; 

// p and (*p).next non-owning; q, (*q).next and r owning 
(*p) .next = r; 

// (*p).next must have ownership 


Let’s now also consider aliasing. A possible assumption is that, just before line 
3, p and q alias, meaning that (*p) .next and (*q) .next also alias. Then, after 
line 3, (*p) .next and (*q) .next will still alias (pointing to the same memory 
object). However, according to the ownership scheme above, both (*p) .next 
and (*q).next are owning, which is not allowed in Rust, where a memory 
object must have a unique owner. This discrepancy was not detected by the 
ownership analysis mimicked above. The issue is that the ownership analysis 
ignored aliasing. Indeed, ownership should not be transferred to (*p) .next if 
there exists an owning alias that, after the ownership transfer, continues to point 
to the same memory object as (*p) .next. 
Precise aliasing information is very difficult to compute, especially in the 
presence of inductively defined data structures. In the current paper, we alle- 
viate the need to check aliasing by making a strengthening assumption about 
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the Rust ownership model: we restrict the way in which pointers can acquire 
ownership along an access path, thus limiting the interaction between ownership 
and aliasing. In particular, we introduce a novel concept of ownership mono- 
tonicity. This property states that, along an access path, the ownership values 
of pointers can only decrease (see Definition 1, where is_prefix(a, b) returns true 
if access path a is a prefix of b, and false otherwise — e.g. is_prefix(p, (*p) .next) 
= true). Going back to the previous code listing, the ownership monotonicity 
implies that, for access path (*p) .next we have Op > Ovap) next, and for access 
path (*q).next we have Og > Ovag).next- This means that, if (*p).next is 
allowed to take ownership, then p must already be owning. Consequently, all 
aliases of p must be non-owning, which means that all aliases of (*p) .next, 
including (*q) .next, are non-owning. 


Definition 1 (Ownership monotonicity). Given two access paths a and b, 
if is_prefiz(a,b), then Og > Op. 


Ownership monotonicity is stricter than the Rust semantics, causing our analysis 
to reject two scenarios that would otherwise be accepted by the Rust compiler 
(see discussion in Sect.5.4). In this work, we made the design decision to use 
ownership monotonicity over aliasing analysis as it allows us to retain more 
control over the accuracy of the translation. Conversely, using an aliasing analysis 
would mean that the accuracy of the translation is directly dictated by the 
accuracy of the aliasing analysis (i.e. false alarms from the aliasing analysis [23, 
40] would result in CROWN not translating pointers that are actually safe). With 
ownership monotonicity, we know exactly what the rejected valid ownership 
schemes are, and we can explicitly enable them (again, see discussion in Sect. 5.4). 


5.2 Generation of Ownership Constraints 


During constraint generation, we assume a given k denoting the length of the 
longest access path used in the code. This enables us to capture the ownership 
of all the access paths exposed in the code. Later in this section, we will discuss 
the handling of loops, which may expose longer access paths. 

Next, we denote by P the set of all access paths in a program, base_var(a) 
returns the base variable of access path a, and |a| computes the length of the access 
path a in terms of applied field selection operators from the base variable. In the 
context of the previous code listing, base_var((*p) .next) = p, base_var(p) = p, 
|p| = 1 and |(*p).next| = 2. Then, we define ap(v, lb, ub) to return the set of 
access paths with base variable v and length in between lower bound lb and upper 
bound ub: ap(v,lb,ub) = {a € P|base_var(a) =v Alb < |a| < ub}. For illustra- 
tion, we have ap(p, 1,2) = {p, (*p) .next}. 


Ownership Transfer. The program instructions where ownership transfer can 
happen are (pointer) assignment and function call. Here we discuss assignment 
and, due to space constraints, we leave the rules for interprocedural ownership 
analysis in the extended version [41]. Our rule for ownership transfer at assign- 
ment site follows Rust’s Box semantics: when a Box pointer is moved, the 
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ASSIGN 
v = base_var(p), w = base_var(q), 
a € ap(v,|p|,k), b € ap(w,|q|,k), ¢€ ap(v,1,|p|—1), d € ap(w, 1, |q|—1) 
|a| — |p| = || — lal, lel = a 
is_prefix(p,a), isprefix(q,b), is-prefix(c,p), ts-prefix(d, q) 
oO =CU{O. =OAQ,q + Oy = Op AOe = Oc A Og = Oa} 
Chkp=q;>C’ 


Fig. 3. Ownership constraint generation for assignment 


object it points to is moved as well. For instance, in the following Rust pseu- 
docode snippet: 


1 let p,q: Box<Box<i32>>; 
2 p = q; // ownership transfer occurs 
; // the use of q and *q is disallowed 


when ownership is transferred from q to p, *q also loses ownership. Except for 
reassignment, the use of a Box pointer after it lost its ownership is disallowed, 
hence the use of q or *q is forbidden at line 3. 

Consequently, we enforce the following ownership transfer rule: if ownership 
transfer happens for a pointer variable (e.g. p and q in the example), then it 
must happen for all pointers reachable from that pointer (e.g. *p and *q). The 
ownership of pointer variables from which the pointer under discussion is reach- 
able remains the same (e.g. if ownership transfer happens for some assignment 
*p = *q in the code, then q and p retain their respective previous ownership 
values). 


Possible Ownership Transfer at Pointer Assignment: The ownership transfer rule 
at pointer assignment site is captured by rule ASSIGN in Fig. 3. The judgement 
Crp = q; = C’ denotes the fact that the assignment is analysed under the set 
of constraints C, and generates C”. We use prime notation to denote variables 
after the assignment. Given pointer assignment p = q, a and b represent all the 
access paths respectively starting from p and q, whereas c and d denote the access 
paths from the base variables of p and q that reach p and q, respectively. Then, 
equality Og + Oy = Oy» captures the possibility of ownership transfer for all 
access paths originating at p and q: (i) If transfer happens then the ownership 
of b transfers to a’ (Og = Oy and Oy = 0). (ii) Otherwise, the ownership 
values are left unchanged (Oa = Oa and Oy = O»). The last two equalities, 
Ov =O.AOw = Og, denote the fact that, for both (i) and (ii), pointers on access 
paths c and d retain their previous ownership. Note that “+” is interpreted as 
the usual arithmetic operation over N, where we impose an implicit constraint 
0 < O< 1 for every ownership variable O. 


C Memory Leaks: In the ASSIGN rule, we add constraint Oa = 0 to C” in order 
to force a to be non-owning before the assignment. Conversely, having a own- 
ing before being reassigned via the assignment under analysis signals a memory 
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leak in the original C program. Given that in Rust memory is automatically 
returned, allowing the translation to happen would change the semantics of the 
original program by fixing the memory leak. Instead, our design choice is to dis- 
allow the ownership analysis from generating such a solution. As we will explain 
in Sect.8, we intend for our translation to preserve memory usage (including 
possible memory leaks). 


Simultaneous Ownership Transfer Along an Access Path: One may observe that 
the constraints generated by ASSIGN do not fully capture the stated ownership 
transfer rule. In particular, they do not ensure that, whenever ownership transfer 
occurs from p to q, it also transfers for all pointers on all access paths a and 
b. Instead, this is implicitly guaranteed by the ownership monotonicity rule, as 
stated in Theorem 1. 


Theorem 1 (Ownership transfer). If ownership is transferred from p to q, 
then, by the ASSIGN rule and ownership monotonicity, ownership also transfers 
between corresponding pointers on all access paths a and b: Og = Op and Oy = 
0. (proof in the extended version [41]) 


Ownership and Aliasing: We saw in Sect. 5.1 that aliasing may cause situations 
in which, after ownership transfer, the same memory object has more than one 
owner. Theorem 2 states that this is not possible under ownership monotonicity. 


Theorem 2 (Soundness of pointer assignment under ownership mono- 
tonicity). Under ownership monotonicity, if all allocated memory objects have 
a unique owner before a pointer assignment, then they will also have a unique 
owner after the assignment. (proof in the extended version [41]) 


Intuitively, Theorem 2 enables a pointer to acquire ownership without hav- 
ing to consider aliases: after ownership transfer, this pointer will be the unique 
owner. The idea resembles that of strong updates [30]. 


Additional Access Paths: As a remark, it is possible for p and q to be accessible 
from other base variables in the program. In such cases, given that those access 
paths are not explicitly mentioned at the location of the ownership transfer, we 
do not generate new ownership variables for them. Consequently, their current 
ownership variables are left unchanged by default. 


Ownership Transfer Example. To illustrate the ASSIGN rule, we use the 
singly-linked list example below, where we assume that p, q are both of type *mut 
Node. Therefore, we will have to consider the following four access path p, q, 
(*p).next, (*q).next. In SSA-style, at each line in the example, we generate 
new ownership variables (by incrementing their subscript) for the access paths 
mentioned at that line. For the first assignment, ownership transfer can happen 
between p and q, and (*p).next and (*q).next, respectively. For the second 
assignment, ownership can be transferred between (*p).next and (*q) .next, 

while p and q must retain their previous ownership. 
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1 Pp =q; // Op =O A Op, + Og. = Og, A 

2 // O(«p1) next =O0A ess ext T Olxqa).nert = O(xqi).nert 
; (*p).next = (*q).next; 

i // Ops = Op, A Ogs = Ogs ^ 
5 // O(xp2).nert =O0A Ocxps).neat =i O(xqs).neat = O(xq).newt 


Besides generating ownership constraints for assignments, we must model 
the ownership information for commonly used C standard function like malloc, 
calloc, realloc, free, strcmp, memset, etc. Due to space constraints, more 
details about these, as well as the rules for ownership monotonicity and inter- 
procedural ownership analysis are provided in the extended version [41]. 


Handling Conditionals and Loops. As mentioned in Sect. 3.2, we only anal- 
yse the body of loops once as it is sufficient to expose all the required ownership 
variables. For inductively defined data structures, while further unrolling of loop 
bodies increases the length of access paths, it does not expose any new struct 
fields (struct fields do not change ownership between loop iterations). 

To handle join points of control paths, we apply a variant of the SSA con- 
struction algorithm [6], where different paths are merged via ¢ nodes. The value 
of each ownership variable must be the same on all joined paths, or otherwise 
the analysis fails. 


5.3 Solving Ownership Constraints 


The ownership constraint system consists of a set of 3-variable linear constraints 
of the form O, = Ow + Ou, and 1-variable equality constraints O, = 0 and 
Oy =1. 


Definition 2 (Ownership constraint system). An ownership constraint 
system (P, A, X, ©) consists of a set of ownership variables P that can have 
either value 0 or 1, a set of 3-variable equality constraints A C P x P x P, and 
two sets of 1-variable equality constraints, 3’, 2, C P. The equalities in X are 
of the form x = 1, whereas the equalities in X- are of the form x = 0. 


Theorem 3 (Complexity of the ownership constraint solving). Decid- 
ing the satisfiability of the ownership constraint system in Definition 2 is NP- 
complete. (proof in the extended version [41]). 


We solve the ownership constraints by calling a SAT solver. The ownership 
constraints may have no solution. This happens when there is no ownership 
scheme that obeys the Rust ownership model and the ownership monotonicity 
property (which is stricter than the Rust model for some cases), or the original 
C program has a memory leak. In the case where the ownership constraints have 
more than one solution, we consider the first assignment returned by the SAT 
solver. 

Due to the complex Rust semantics, we do not formally prove that a satisfying 
assignment obeys the Rust ownership model. Instead, this check is performed 
after the translation by running the Rust compiler. 
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5.4 Discussion on Ownership Monotonicity 


As mentioned earlier in Sect. 5, ownership monotonicity is stricter than the Rust 
semantics, causing our analysis to potentially reject some ownership schemes 
that would otherwise be accepted by the Rust compiler. We identified two such 
scenarios: 


(i) Reference output parameter: This denotes a reference passed as a function 
parameter, which acts as an output as it can be accessed from outside the func- 
tion (e.g. list in Fig. la). For such parameters, the base variable is non-owning 
(as it is a reference) and mutable, whereas the pointers reachable from it may be 
owning (see example in Fig. 1c, where (*node) .head gets assigned a pointer to a 
newly allocated node). We detect such situations and explicitly enable them. In 
particular, we explicitly convert owning pointers p to &mut (*p) at the translation 
stage. 


(ii) Local borrows: The code below involving a mutable local borrow is not con- 
sidered valid by CROWN as it disobeys the ownership monotonicity: after the 
assignment, local_borrow is non-owning, whereas *local_borrow is owning. 


1 let local_borrow = &mut n; 
2  *local_borrow = Box::new(1); 


While we could explicitly handle the translation to local borrows, in order to 
do so soundly, we would have to reason about lifetime information (e.g. CROWN 
would have to check that there is no overlap between the lifetimes of different 
mutable references to the same object). In this work, we chose not to do this and 
instead leave it as future work (as also mentioned under limitations in Sect. 7). It 
was observed in [13] that scenario (i) is much more prevalent than scenario (ii). 
Additionally, we observed in our benchmarks that output parameter accounts 
for 93% of mutable references (hence the inclusion of a special case enabling the 
translation of scenario (i) in CROWN). 


6 C to Rust Translation 


CROWN uses the results of the ownership, mutability and fatness analyses to 
perform the actual translation, which consists of retyping pointers (Sect. 6.1) 
and rewriting pointer uses (Sect. 6.2). 


6.1 Retyping Pointers 


As mentioned in Sect. 2.2, we do not attempt to translate array pointers to safe 
pointers. In the rest of the section, we focus on mutable, non-array pointers. 
The translation requires a global view of pointers’ ownership, whereas infor- 
mation inferred by the ownership analysis refers to individual program locations. 
For the purpose of translation, given that we refactor owning pointers into box 
pointers, a pointer is considered (globally) owning if it owns a memory object at 
any program location within its scope. Otherwise, it is (globally) non-owning. 
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When retyping pointer fields of structs, we must consider the scope of the struct 
declaration, which generally transcends the whole program. Within this scope, 
each field is usually accessed from several base variables, which must all be taken 
into consideration. For instance, given the List declaration in Fig. 1b and two 
variables 11 and 12 of type *mut List. Then, in order to determine the own- 
ership status of field next, we have to consider all the access paths to next 
originating from both base variables 11 and 12. 

The next table shows the retyping rules for mutable, non-array pointers, 
where we wrap safe pointer types into Option to account for null pointer values: 


Non-array pointers 


Owning Option<Box<T>> 


Non-owning | *mut T or Option<&mut T> 


The non-owning pointers that are kept as raw pointers *mut T correspond 
to mutable local borrows. As explained in Sects. 5.4 and 7, CROWN doesn’t cur- 
rently handle the translation to mutable local borrows due to the fact that we do 
not have a lifetime analysis. Notably, this restriction does not apply to output 
parameters (which covers the majority of mutable references), where we trans- 
late to mutable references. The lack of a lifetime analysis means that we also 
can’t handle immutable local borrows, hence our translation’s focus on mutable 
pointers. 


6.2 Rewriting Pointer Uses 


The rewrite of a pointer expression depends on its new type and the context 
in which it is used. For example, when rewriting q in p = q, the context will 
depend on the new type of p. Based on this new type, we can have four contexts: 
BoxCtxt which requires Box pointers, MutCtxt which requires &mut references, 
ConstCtxt which requires & references, and RawCtxt which requires raw pointers. 
For example, if p above is a Box pointer, then we rewrite q in a BoxCtxt. 
Then, the rewrite takes place according to the following table, where columns 
correspond to the new type of the pointer to be rewritten, and rows represent 
possible contexts!. 
Option<Box<T>> |Option<&mut T> anut T 
[BoxCtxt [p.take) L  |Some(Box::from_raw(p)) | 
MutCtxt |p.as_deref_mut() |p.as_deref_mut () |p.as_mut () 
ConstCtxt|p.as_deref () p.as_deref () p.as_ref () 
RawCtxt |to_raw(&mut p) |to_raw(&mut p) |p 


Our translation uses functions from the Rust standard library, as follows: 


1. When Option<Box<T>> is passed to a BoxCtxt, we expect a move, and con- 
sequently we use take to replace the value inside the option with None; 

2. We use as_deref and as_deref_mut in order to not consume the original 
option, and we create new options with references to the original ones; 


1 The cell marked as L is not applicable due to our treatment of output parameter. 
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3. as_mut and as_ref converts raw pointers to references; 
4. Box: :from_raw converts raw pointers into Box pointers. 


We also define the helper function to_raw that transform safe pointers into 
raw pointers: 


fn to_raw<T>(b: &mut Option<Box<T>>) -> *mut T { 


b.as_deref_mut().map(|b| b as *mut T).unwrap_or(null_mut()) 
} 


Here, we explain to_raw for a Box argument (the explanation for &mut is the 
same because of the polymorphic nature of as_deref_mut): 


1. To convert Option<Box<T>>, we first mutably borrow the entire option as 
denoted by the mutable borrow argument of the helper function. This is 
needed because Option is not copyable, and it would be otherwise consumed; 

2. as_deref_mut converts &mut Option<Box<T>> to Option<&mut T>; 

map converts the optional part of the reference into an option of raw pointers; 

4. Finally, unwrap_or returns the Some value of the option, or a null pointer 
std: :ptr::null_mut() if the value is None. 


a 


Dereferences: When a pointer p is dereferenced as part of a larger expression 
(e.g. (*p) next), we need an additional unwrap (). 

Box pointers check: Rust disallows the use of Box pointers after they lost 
their ownership. As this rule cannot be captured by the ownership analysis, 
such situations are detected at translation stage, and the culpable Box pointers 
are reverted back to raw pointers. 

For brevity, we omitted the slightly different treatment of struct fields that 
are not of pointer type. 


7 Challenges of Handling Real-World Code 


We designed CROWN to be able to analyse and translate real-world code, which 
poses significant challenges. In this section, we discuss some of the engineering 
challenges of CROWN and its current limitations. 


7.1 Preprocessing 


During the transpilation of C libraries, c2rust treats each file as a separate com- 
pilation unit, which gets translated into a separate Rust module. Consequently, 
struct definitions are duplicated, and available function definitions are put in 
extern blocks [17]. We apply a preprocessing step similar to the resolve-imports 
tool of Laertes [17] that links those definitions across files. 
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7.2 Limitations of the Ownership Analysis 


There are a few C constructs and idioms that are not fully supported by 
our implementation, for which CROWN generates partial ownership constraints. 
Crown’s translation will attempt to rewrite a variable as long as there exists 
a constraint involving it. As a result, the translation is in theory neither sound 
nor complete: it may generate code that does not compile (though we have not 
observed this in practice for the benchmarks where CROWN produces a result — 
see Sect. 8) and it may leave some pointers as raw pointers resulting in a less than 
optimal translation. We list below the cases when such a scenario may happen. 


Certain Unsafe C Constructs. For type casts, we only generate ownership trans- 
fer constraints for head pointers; for unions we assume that they contain no 
pointer fields and consequently, we generate no constraints; similarly, we gener- 
ate no constraints for variadic arguments. We noticed that unions and variadic 
arguments may cause our tool to crash (e.g. three of the benchmarks in [17], as 
mentioned in Sect.8). Those crashes happen when analysing access paths that 
contain dereferences of union fields (where we assumed no pointer fields), and 
when analysing calls to functions with variadic arguments where a pointer is 
passed as argument. 


Function Pointers. CROWN does not generate any constraints for them. 


Non-standard Memory Management in C Libraries. Certain C libraries wrap 
malloc and free, often with static function pointers (pointers to allocator /deal- 
locator are stored in static variables), or function pointers in structs. CROWN 
does not generate any constraints in such scenarios. In C, it is also possible to 
use malloc to allocate a large piece of memory, and then split it into several 
sub-regions assigned to different pointers. In our ownership analysis, only one 
pointer can gain ownership of the memory allocated by a call to malloc. Another 
C idiom that we don’t fully support occurs when certain pointers can point to 
either heap allocated objects, or statically allocated stack arrays. CROWN gener- 
ates ownership constraints only for the heap and, consequently, those variables 
will be left under-constrained. 


7.3 Other Limitations of CROWN 


Array Pointers. For array pointers, although CROWN infers the correct owner- 
ship information, it does not generate the meta data required to synthesise Rust 
code. 


Mutable Local Borrows. As explained in the last paragraph of Sect. 6.1, CROWN 
does not translate mutable non-owning pointers to local mutable references as 
this requires dedicated analysis of lifetimes. Note that CROWN does however 
generate mutable references for output parameters. 


Access Paths that Break Ownership Monotonicity. As discussed in Sect. 5.4, own- 
ership monotonicity may be stricter in certain cases than Rust’s semantics. 
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8 Experimental Evaluation 


We implement CROWN on top of the Rust compiler, version nightly- 
2023-01-26. We use c2rust with version 0.16.0. For the SAT solver, we rely 
on a Rust-binding of z3 [20] with version 0.11.2. We run all our experiments 
on a MacBook Pro with an Apple M1 chip, with 8 cores (4 performance and 4 
efficiency). The computer has 16 GB RAM and runs macOS Monterey 12.5.1. 


Benchmark Selection. To evaluate the utility of CROWN, we collected a 
benchmark suite of 20 programs (Table1). These include benchmarks from 
Laertes [17]’s accompanying artifact [16] (marked by * in Table 1)?, and addi- 
tionally 8 real-world projects (binn, brotli, buffer, heman, json.h, libtree, 
lodepng, rgba) together with 4 commonly used data structure libraries (av1, 
bst, ht, quadtree). 


Functional and Non-functional Guarantees. With respect to functional 
properties, we want the original program and the refactored program to be 
observationally equivalent, i.e. for each input they produce the same output. 
We empirically validated this using all the available test suites (i.e. for libtree, 
rgba, quadtree, urlparser, genann, buffer in Table 1). All the test suites con- 
tinue to pass after the translation. For nonfunctional properties, we intend to 
preserve memory usage and CPU time, i.e. we don’t want our translation to 
introduce runtime overhead. We also validated this using the test suites. 


Table 1. Benchmarks information 


Benchmark | Files | Structs | Functions LOC Benchmark Files | Structs Functions | LOC 

Avl 1 2 11 229 libcsv* 1 6 23 976 
binn 1 5 165 4426 | libtree 1 |18 32 2610 
brotli 30 237 867 537723 | libzahl* 49 | 65 108 4655 
bst 1 1 6 154 |lil* 2 9 136 5670 
buffer 2 3 42 1207 | lodepng 1 19 236 14153 
bzip2* 9 39 126 14829 | quadtree 5 | 14 31 1216 
genann* 6 10 27 2410 | rgba 2) 3 19 1855 
heman 24 52 302 13762 | robotfindskitten* 1 8 18 1508 
ht 1 3 10 264 | tulipindicators* |111 | 18 229 22363 
json.h 1 13 53 3860 | urlparser* I 1 21 1379 


8.1 Research Questions 


We aim at answering the following research questions. 


? We excluded json-c, optipng, tinycc where CROWN crashes because of the uses of 
unions and variadic arguments as discussed in Sect. 7. Additional programs (qsort, 
grabc, xzoom, snudown, tmux, 1ibxm12) are mentioned in the paper [17] but are either 
missing or incomplete in the artifact [16]. 
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RQ1. How many raw pointers/pointer uses can CROWN translate to safe 
pointers/pointer uses? 

RQ2. How does CRowNn’s result compare with the state-of-the-art [17]? 
RQ3. What is the runtime performance of CROWN? 


RQ 1: Unsafe pointer reduction. In order to judge CROWN’s efficacy, we 
measure the reduction rate of raw pointer declarations and uses. This is a direct 
indicative of the improvement in safety, as safe pointers are always checked by 
the Rust compiler (even inside unsafe regions). As previously mentioned, we 
focus on mutable non-array pointers. The results are presented in Table 2, where 
##ptrs counts the number of raw pointer declarations in a given benchmark, 
#fuses counts the number of times raw pointers are being used, and the Laertes 
and Crown headers denote the reduction rates of the number of raw pointers 
and raw pointer uses achieved by the two tools, respectively. For instance, for 
benchmark avl, the rate of 100% means that all raw pointer declarations and 
their uses are translated into safe ones. Note that the “-” symbols on the row 
corresponding to robotfindskitten are due to the fact that the benchmark 
contains 0 raw pointer uses. 

The median reduction rates achieved by CROWN for raw pointers and raw 
pointer uses are 37.3% and 62.1%, respectively. CROWN achieves a 100% reduc- 
tion rate for many non-trivial data structures (avl, bst, buffer, ht), as well 
as for rgba. For brotli, a lossless data compression algorithm developed by 
Google, which is our largest benchmark, CROWN achieves reduction rates of 
21.4% and 20.9%, respectively. The relatively low reduction rates for brotli and 
a few other benchmarks (tulipindicators, lodepng, bzip2, genann, 1ibzah1) 
is due to their use of non-standard memory management strategies (discussed 
in detail in Sect. 7). 

Notably, all the translated benchmarks compile under the aforementioned 
Rust compiler version. As a check of semantics preservation, for the benchmarks 
that provide test suites (libtree, rgba, quadtree, urlparser, genann, buffer), 
our translated benchmarks pass all the provided tests. 


RQ 2: Comparing with state-of-the-art. The comparison of CROWN with 
Laertes [17] is also shown in Table 2, with bold font highlighting better results. 
The data on Laertes is either directly extracted from the artifact [16] or has 
been confirmed by the authors through private correspondence. We can see that 
CROWN outperforms the state-of-the-art (often by a significant degree) in most 
cases, with lodepng being the only exception, where we suspect that the reason 
also lies with non-standard memory management strategies mentioned before. 
Laertes is less affected by this as it does not rely on ownership analysis. 


RQ 3: Runtime performance. Although our analysis relies on solving a con- 
straint satisfaction problem that is proven to be NP-complete, in practice the 
runtime performance of CROWN is consistently high. The execution time of the 
analysis and the rewrite for the whole benchmark suite is within 60s (where the 
execution time for our largest benchmark, brot1li, is under 10s). 


Ownership Guided C to Rust Translation AT7 


Table 2. Reduction of (mutable, non-array) raw pointer declarations and uses 


Benchmark #ptrs | Laertes | Crown | #uses | Laertes | Crown 
avl 8 0.0% 100.0% 41 |0.0% 100.0% 
binn 103 46.6% | 65.0% 247 |62.3% | 71.3% 
brotli 846 0.0% 21.4% |3686 | 0.0% 20.9% 
bst 5 0.0% 100.0% 22 |0.0% 100.0% 
buffer 38 0.0% 100.0% 56 |0.0% 100.0% 
bzip2* 126 14.3% |26.2% |2946 |2.2% 3.7% 
genann* 28 0.0% | 7.1% 160 |0.0% | 15.0% 
heman 360 30.3% | 35.0% | 926 |50.2% | 60.2% 
ht 6 33.3% | 100.0% 28 |42.9% 100.0% 
json.h 128 123% 23.4% | 647 112% (62.1% 
libesv* 20 65.0% | 70.0% 141 |97.9% | 97.9% 
libtree 48 29.2% | 39.6% 227 |33.0% | 62.1% 
libzahl* 87 2.2% 16.1% 279 |4.1% 16.8% 
lil* 202 9.2% 18.8% |1018 |51.4% | 69.4% 
lodepng 227 = 46.3% | 44.9% | 1232 | 40.4% | 37.7% 
quadtree 33 0.0% 42.4% | 117 |0.0% (48.7% 
rgba 6 | 83.3% | 83.3% 12 | 100.0% 100.0% 
robotfindskitten* | 1 (0.0% | 0.0% 0 |- - 
tulipindicators* |134 0.0% |0.7% 625 |0.0% | 0.0% 
urlparser* 9 (0.0% 11.1% 40 |0.0% (45.0% 


9 Related Works 


Ownership Discussion. Ownership has been used in OO programming to 
enable controlled aliasing by restricting object graphs underlying the runtime 
heap [11,12] with efforts made in the automatic inference of ownership infor- 
mation [1,4,39], and applications of ownership to memory management [5,42]. 
Similarly, the concept of ownership has also been applied to analyse C/C++ pro- 
grams. Heine et al. [24] inferred pointer ownership information for detecting 
memory leaks. Ravitch et al. [37] apply static analysis to infer ownership for 
automatic library binding generation. Giving the different application domains, 
each of these works makes different assumptions. Heine et al. [24] assumes 
that indirectly-accessed pointers (i.e. any pointer accessed through a path, like 
(*p) .next) cannot acquire ownership, whereas Ravitch et al. [37] assumes that 
all struct fields are owning unless explicitly annotated. We took from [24] its 
handling of flow sensitivity, but enhanced it with the analysis of nested point- 
ers and inductively defined data structures, which we found to be essential for 
translating real-world code. The analysis in [24] assigns a default “non-owning” 
status to all indirectly accessed pointers. This rules out many interesting data 
structures such as linked lists, trees, hash tables, etc., and commonly used idioms 
such as passing by reference. Conversely, in our work, we rely on a strengthening 
assumption about the Rust ownership model, which allows handling the afore- 
mentioned scenarios and data structures. Lastly, the idea of ownership is also 
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broadly applied in concurrent separation logic [7—9, 19,38]. However, these works 
are not aimed as automatic ownership inference systems. 


Rust Verification. The separation logic based reasoning framework Iris [28] 
was used to formalise the Rust type system [27], and verify Rust programs [34]. 
While these works cover unsafe Rust fragments, they are not fully automatic. 
When restricting reasoning to only safe Rust, RustHorn [35] gives a first-order 
logic formulation of the behavior of Rust code, which is ameanable to fully auto- 
matic verification, while Prusti [3] leverages Rust compiler information to gener- 
ate separation logic verification conditions that are discharged by Viper [36]. In 
the current work, we provide an automatic ownership analysis for unsafe Rust 
programs. 


Type Qualifiers. Type qualifiers are a lightweight, practical mechanism for 
specifying and checking properties not captured by traditional type systems. A 
general flow-insensitive type qualifier framework has been proposed [21], with 
subsequent applications analysing Java reference mutability [22,25] and C array 
bounds [32]. We adapted these works to Rust for our mutability and fatness 
analyses, respectively. 


C to Rust Translation. We have already discussed c2rust [26], which is an 
industrial strength tool that converts C to Rust syntax. c2rust does not attempt 
to fix unsafe features such as raw pointers and the programs it generates are 
always annotated as unsafe. Nevertheless it forms the bases of other translation 
efforts. CRustS [31] applies AST-based code transformations to remove superflu- 
ous unsafe labelling generated by c2rust. But it does not fix the unsafe features 
either. Laertes [17] is the first tool that is actually able to automatically reduce 
the presence of unsafe code. It uses the Rust compiler as a blackbox oracle 
and search for code changes that remove raw pointers, which is different from 
CROWN’s approach (see Sect. 8 for an experimental comparison). The subsequent 
work [15] develops an evaluation methodology for studying the limitations of 
existing techniques that translate unsafe raw pointers to safe Rust references. 
The work adopts a new concept of ‘pseudo safety’, under which semantics preser- 
vation of the original programs is no longer guaranteed. As explained in Sect. 8, 
in our work, we aim to maintain semantic equivalence. 


10 Conclusion 


We devised an ownership analysis for Rust programs translated by c2rust that 
is scalable (handling half a million LOC in less than 10s) and precise (han- 
dling inductive data structures) thanks to a strengthening of the Rust ownership 
model, which we call ownership monotonicity. Based on this new analysis, we 
prototyped a refactoring tool for translating C programs into Rust programs. 
Our experimental evaluation shows that the proposed approach handles real- 
world benchmarks and outperforms the state-of-the-art. 
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Abstract. R2U2 is a modular runtime verification framework capable of mon- 
itoring sets of specifications in real time and in resource-constrained environ- 
ments. Such environments demand that a runtime monitor be fast, easily integrat- 
able, accessible to domain experts, and have predictable resource requirements. 
Version 3.0 adds new features to R2U2 and its associated suite of tools that meet 
these needs including a new front-end compiler that accepts a custom specifi- 
cation language, a GUI for resource estimation, and improvements to R2U2’s 
internal architecture. 


1 Tool Overview 


R2U2 (Realizable Responsive Unobtrusive Unit) is a modular framework for hard- 
ware (FPGA) and software (C and C++) real-time runtime verification (RV). R2U2 
runs online, during system execution, with minimal overhead. (It also runs offline, over 
simulated data streasms or recorded data logs.) R2U2 is stream-based; given a runtime 
requirement ọ and an input computation 7 of sensor and software values at each times- 
tamp 2, R2U2 returns the verdict (true or false) for all ¿ as to whether 7,2 = y. 
We call this output stream an execution sequence [34]; it is a stream of two-tuples 
(verdict, time) for every time i. R2U2 encodes specifications as observers (a set of 
which we call a configuration) via an optimized algorithm with published proofs of 
correctness, time, and space [18, 20,34]. 

Figure 1 depicts a standard R2U2 workflow. To integrate R2U2 into a target system, 
we first need a validated set of runtime requirements. Given the system’s resource con- 
straints, the Configuration Compiler for Property Organization (C2PO) creates an opti- 
mized encoding of the input set of requirements as an R2U2 configuration. Users can 
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Lar t 
System C2PO = |i R2U2 
Requirements | | Specification) |C2PO_ : (Configuration 


Fig. 1. Workflow for verifying a specification using R2U2. Red shaded boxes denote runtime 
components and blue shaded boxes denote design-time components. Note that for validation, the 
runtime components can run offline, e.g., by replacing the data stream with a log file of simulated 
data. Users formalize their system requirements as MLTL formulas within a C2PO specification, 
use C2PO to generate an R2U2 configuration, then monitor the verdicts R2U2 outputs based on 
the configuration and data stream. (Color figure online) 


swap configurations monitored by R2U2 at runtime, during system execution, based on 
system state, mission phase, or to upgrade the specification version — all without recom- 
piling and redeploying the R2U2 engine, a key feature for systems that require onerous 
code change certifications, or e.g., systems that need to be launched into space and then 
dynamically updated as their hardware degrades. 

R2U2 fills the unique gap in the RV community described by its name [39]: 


REALIZABILITY R2U2 analyzes generic, re-usable specifications in Mission-Time 
Linear Temporal Logic (MLTL) [20,34], a variant of LTL with closed integer- 
bounded intervals on the temporal operators. MLTL excels at capturing require- 
ments conceptualized as timelines, as is common in aerospace operational concepts, 
e.g., [1,11,45]. At its core, R2U2 specifications combine either a future-time or 
past-time MLTL formula with simple signal comparators [34]. New optional exten- 
sions provide additional features, such as simple set-level reasoning [5]. R2U2’s 
hardware implementation, written in VHDL, avoids overburdening limited com- 
puting resources by utilizing Field Programmable Gate Arrays (FPGAs) to mon- 
itor in parallel with the system under absolute timing guarantees. R2U2’s two 
software implementations avoid hardware integration and software instrumentation 
challenges at the cost of (minimal) compute resources on the host system and are 
designed to be suitable for different environments. The C version forgoes mem- 
ory allocation and bounds checking to provide fast deterministic results for real- 
time controllers under stringent certifiability criteria; alternatively, the C++ ver- 
sion makes full use of dynamic memory, templates, and runtime checks for max- 
imum flexibility without monitor tuning. Additionally, the implementations differ 
significantly in architecture to provide fault independence. The three monitor imple- 
mentations enable on-board (embedded) and on-ground execution, integration with 
multiple human-machine interaction paradigms, cross-validation, or triple modular 
redundancy voting strategies to increase system trust. 

RESPONSIVENESS R2U2 provides two levels of responsiveness. At a system level, 
runtime reconfiguration of the monitor without a lengthy re-compilation (and re- 
certification) process keeps R2U2 responsive to the system’s needs even as the 
mission, platform, or requirements evolve. At a specification level, R2U2’s asyn- 
chronous (event-triggered) observers provably report both true and false ver- 
dicts (rather than only reporting property violations) in the first timestamp where 
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there is sufficient information to evaluate 7,2 = œ, thus monitoring integrity, 
safety, and security requirements in real-time. Since the monitor’s response time 
is a function of the specification and known a priori, higher-level autonomous sys- 
tem health and decision-making controllers can rely on R2U2 verdicts to provide a 
tight bound on mitigation triggering or other reactive behaviors. 

UNOBTRUSIVENESS R2U2’s multi-architecture, multi-platform design enables effec- 
tive runtime verification while respecting crucial unobtrusiveness properties of 
embedded systems, including functionality (no change in behavior), certifiability 
(bounded time and memory under safety cases), timing (no interference with timing 
guarantees), and tolerances (respect constraints on size, weight, power, bandwidth, 
and overhead). R2U2 obeys unobtrusiveness constraints, provably fitting into tight 
resource limits and operational constraints frequently encountered in space mis- 
sions. It can operate without code instrumentation or insight into black-box sub- 
components such as ITAR, restricted, or closed-source modules [29]. 


User Base. After an extensive survey of all currently-available verification tools, 
NASA’s Lunar Gateway Vehicle System Manager (VSM) team selected R2U2 for oper- 
ational verification [8—10]; R2U2 is currently operating in the NASA core Flight Sys- 
tem/core Flight Executive (cFS/cFE) [28] VSM environment. R2U2 is embedded in 
the space left over on the FPGA controlling NASA’s Robonaut2’s knee to provide real- 
time fault disambiguation [18], interfacing via the Robot Operating System (ROS) [31]. 
R2U2 is running on a UAS Traffic Management (UTM) system [5], where it recently 
detected a flight-plan timing fault. JAXA is running R2U2 on a 2021 autonomous satel- 
lite mission with a requirement for a provable memory bound of 200KB [30]. R2U2 
recently verified a CubeSat communications system [24], an open-source UAS [16], a 
sounding rocket [15], and a high-altitude balloon [23]. The CySat-I satellite uses R2U2 
for autonomous fault recovery [2]. In the recent past, R2U2 was used in NASA’s Auton- 
omy Operating System (AOS) for UAS [22] (where it flew on NASA’s S1000 octocopter 
[21]), the NASA Swift UAS [13,34, 36,43], and the NASA DragonEye UAS [41,44]. 
R2U2 aided in NASA embedded system battery prognostics [42] and a case study on 
small satellites and landers [35]. R2U2 has also proven useful for monitoring and diag- 
nosis of security threats on-board NASA UAS like the DragonEye [27,40]. R2U2 was 
cataloged by the user community in a 2018 taxonomy of RV tools [12,39], and appeared 
in a 2020 Institute of Information Security (ETH Ziirich, Switzerland) case study [33]. 
R2U2 is open-source, dual licensed under MIT! and Apache-2.0.7 


2 Compiler and Specification Language 


Specification is a notoriously difficult aspect of RV [37]; verification results are only 
meaningful if the input specifications are correct and complete with respect to the sys- 
tem requirements. An RV engine is only usable if system engineers can validate that it 
monitors its given requirements as they expect, so they can clearly explain when and 
why different RV verdicts occur. In consultation with outside groups using R2U2 on 
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Table 1. Overview of changes to the R2U2 specification syntax for a basic temperature limit 
requirement, where Temp is located at index 0 of the input signal vector. This is not an exhaustive 
comparison but covers directly equivalent features, while Fig. 2 and the remainder of Sect. 2 detail 
new capabilities. 


Feature Previous Syntax [39] C2PO Syntax 
Declare Signal Temp = 0; INPUT 
Fix name to signal index Tempi lioan; 
Declare name/type, signal index handled 
separately 
Define Macro N/A DEFINE 


Temp_Limit = 97; 


Improves readability and maintenance 


Define Struct N/A STRUGE 
Milewain = { GPs iil@ENeR Ie 


Enables data organization 


Atomic Checker OVERTEMP = float(Temp) > ATOMIC 
HF OVERTEMP = Temp > 
Temp_Limit; 


In-lined constants, signal type determined 
by function name All declared names available, uses known 
signal types 


MLTL Formula G[0,3] !OVERTEMP; FTSPEC 
G[0,3] !OVERTEMP; 
Requires temporal tense declared 
(FTSPEC or PTSPEC) 


real systems [8, 14,30], we developed a new specification language and an accompany- 
ing formula-set compiler. The language’s and compiler’s features make specifications 
easier to read and write, improving user productivity and easing validation to address 
the challenges of specification in RV. 


2.1 New Specification Language 


Previous versions of R2U2 used a specification language derived from the implemen- 
tation of the hardware runtime engine. While sufficiently expressive for the creation 
of R2U2 configurations, it utilized a restricted syntax that supported only basic MLTL 
operators and single-operator expressions over non-Boolean data types. Writing spec- 
ifications that are transparent and easy to validate could be difficult without in-depth 
knowledge of R2U2’s architecture [17]. 

The new SMV-inspired [26] specification language allows users the option to write 
specifications more naturally with support for compound expressions over complex data 
types including sets and C-like structs as well as sections for defining structs, variables, 
macros, and MLTL formulas. C2PO supports Boolean, struct, and parametric set types 
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i) STRUCT 
Request: { state: int; time_active: float; }; 
Arbiter: { ReqSet: set<Request>; }; 


INPUT 
6 st0, stl, st2, st3: int; 
7 ta0, tal, ta2, ta3: float; 


9| DEFINE 

10 WT := 0; GR := 1; RJ := 2; -- WAIT, GRANT, REJECT 
11 

12 rqOQ := Request (st0, ta0); rql := Request(stl, tal); 
13 rq2 := Request (st2, ta2); rq3 := Request(st3, ta3); 
14 

15 ArbO := Arbiter ({rq0, rql}); Arbl := Arbiter({rq2, rq3}); 
16 ArbSet := {Arb0, Arbl}; 

i 

is| FTSPEC 

19 (rqO.time_active - rql.time_active) < 10.0 && 

20 (rql.time_active - rq0O.time_active) < 10.0; 


2 foreach(arb: ArbSet) ( 

23 foreach (rq: arb.ReqSet) ( 

24 (rq.state == WT) U[0,5] (rq.state == GR || rq.state == RJ) 
25 ) 

26 ); 


Fig. 2. Sample C2PO specification file using structs (lines 2-3, 12-13), sets (lines 3, 15-16), and 
set aggregation operators (lines 22-23). The specification on lines 19-20 captures the English 
requirement, “The active times for rqo and rqi shall differ by no more than 10.0 s,’ and the 
specification on lines 22-26 captures the English requirement, “For each request r of each arbiter 
in ArbSet, r’s status shall be GRANT or REJECT within the next 5s and until then shall be 
WAITING.” 


with configurable integer and floating point types. To run R2U2 in software, users select 
a C standard type for each of the integer and float types e.g., an unsigned 16-bit inte- 
ger (uint16_t) and double-precision floating point (double). If targeting hardware 
(FPGA implementation), users can configure integer and float types to a bit-width sup- 
ported by the target system. Table 1 presents a comparison between the old [39] and 
new syntaxes and Fig. 2 presents a sample file for monitoring a request-handling system. 

To create an R2U2 configuration, C2PO generates an Abstract Syntax Tree (AST) 
representation of the input, performs type checking, applies optimizations and rewriting 
rules, then outputs the corresponding R2U2 configuration. R2U2 does not use automata 
to encode temporal logic observers (as reported erroneously elsewhere [12]); instead 
C2PO traverses the AST to produce assembly-like imperative evaluation instructions 
for the R2U2 monitor to executed at runtime. 

In order to meet the demands of a wide range of systems, R2U2 Version 3.0 includes 
many optional features that are specific to one of the three implementations that can 
be enabled during system integration. For example, the Booleanizer module computes 
arbitrary non-Boolean expressions in the C implementation of R2U2, but this feature is 
not an option in the C++ or hardware implementations. C2PO allows users to enable 
or disable such features according to the capabilities of their target systems and chosen 
R2U2 implementation. 
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2.2 Assume-Guarantee Contract Support 


Assume Guarantee Contracts (AGCs) provide a template for structuring and validat- 
ing complex requirements in aerospace operational concepts [3]. AGCs feature a guard 
or trigger clause called the “assumption” and a system invariant called the “guaran- 
tee;” they have been used to structure both English and formal (e.g., temporal logic) 
requirements by projects including the NASA Lunar Gateway Vehicle System Man- 
ager [10]. R2U2 V3.0 now directly supports AGCs with an input syntax for expressing 
AGCs in C2PO and an output format for R2U2 that provides granular interpretation of 
verdicts, as presented in [17]. The input syntax for declaring an AGC is assumption 

=> guarantee where the semantics for this logical implication provides three dis- 
tinct cases: the AGC is “inactive” if the assumption is false, “true” if both the assump- 
tion and guarantee are true, and “false” otherwise. When the optional AGC feature is 
enabled, R2U2 produces three-valued verdicts to represent the state of the AGCs in a 
clear format; otherwise R2U2 interprets logical implications in the standard way (where 
false — true results in the verdict true rather than inactive). 


2.3 Set Aggregation 


A common pattern in real-world specifications applies an identical formula to vari- 
ous input signals, such as testing all temperature sensors for an overheat condition. A 
naive encoding of these specifications in MLTL can be excessively large to the point 
of obscuring intent while providing ample opportunity for copy-paste errors, typos, or 
incomplete updates to variables — all of which are difficult for humans to spot dur- 
ing validation. C2PO mitigates this issue by supporting set aggregation operators that 
compactly encode these expressions as sets of streams with a predicate applied to each 
element [14]. 

To illustrate, consider the specification in Fig. 2. The direct encoding of this speci- 
fication without the “foreach” operator is 


(rqg0.status == W) U[0,5] (rq0.status == G || rq0.status == R) && 
(rql.status == W) U[0,5] (rql.status == G || rql.status == R) && 
(rq2.status == W) U[0,5] (rq2.status == G || rq2.status == R) && 
(rq3.status == W) U[0,5] (rq3.status == G || rq3.status == R) 


Contrast this with the more compact encoding using the “foreach” operator on lines 22— 
26 in Fig. 2. The latter retains the intent of the English-level requirement while being 
semantically equivalent to the direct encoding. This concise representation both eases 
validation by improving readability and reduces the potential for errors by avoiding 
replicated values that require simultaneous updates. 


2.4 Common Subexpression Elimination 


C2PO uses an AST as the intermediate representation of its input and can therefore 
use optimization techniques common in compiler design such as Common Subexpres- 
sion Elimination (CSE) [6]. Similarly to applying the isomorphism elimination rule for 
Binary Decision Trees [4], Common Subexpression Elimination (CSE) prunes all but 
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one instance of any identical AST subtrees, reusing the result from that subtree for 
monitoring multiple requirements without wasting memory and execution time by rep- 
resenting it redundantly. Analysis of CSE on randomly-generated MLTL requirements 
resulted in a speed-up of 37% and required 4.3% less memory [18]. We expect larger 
savings in human-authored requirement specifications, however, due to reuse of both 
common specification patterns and structures in the underlying system. For example, 
a non-trivial subexpression might represent a system’s confidence in its navigational 
fix and many specifications might depend on the navigation state, thus re-using this 
subexpression. 


3 Resource Estimation GUI 


As R2U2’s user base expands, so does the variance in the domain expertise of these 
specification authors; R2U2 V3.0 therefore enables resource-aware requirements spec- 
ification by users without experience with the performance trade-offs of syntactically 
different but semantically equivalent temporal logic encodings. The R2U2 Configura- 
tion Explorer is a web application that provides visual feedback from C2PO about the 
resource costs of specifications, e.g., in the form of MLTL formulas; see Fig. 3. With 
a short feedback loop on critical parameters like execution time, memory, and relative 
formula size, all a user needs to understand is what resources are available on their 
target system (not R2U2 itself) to write performant specifications that fit the available 
resources. 


f2Po Input Software Configuration Hardware Configuration Mouseover Data 
INPUT Clock Frequency (GHz) Clock Frequency (MHz) && Expression: (b@)U[@,5](b1) 
Node: U[0,5] 
a@,a1,a2: bool; Bio 
b0, b1,b2: bool; a ou Reece 
CPU Operator Latencies LUT Type Select i 
DEFINE ufo. 1,3] SCQ size: 4 


c =al || a2; LUT-3 


Worst-case Exec. Time Resource to Observe 


SPEC 


s4: s2 && s3; 


uint8_t 


float 


ommon Subexpression Elimination 


jooleanizer 


ended Operators (2) 


compile status: ok 


C2PO Log © 


Comparators per Node 
33 

Adders per Node 
32 

FPGA Operator Latencies 
EDIT 

Worst-case Exec. Time 


4.30000ps/ 0.23256MHz 
Total SCQ Memory Slots 


18 


1 
@. 4 
® 


Number of LUTS 


By at 18.00000jis/ 0.05556MHz || LUT - 

Sts iG Est. SCQ Memory Timestamp Length (Bits) 

s2: bð U[@,5] b1; 

s3: G[1,3] b2; 0.0703125KB 32 LUT Requirements Assembly (6) 


TL: 
: ni: end no fo 
: n2: load s1 


: n3: load s2 

i n4: or n2 n3 

: n5: end n4 f1 

: n6: load s3 

: n7: load s4 

: n8: until n6 n7 0 
: n9: end n8 f2 

: n10: load s5 

: n14: global n10 1 


nO: load sO 


Timestamp Width (Bits) 


Fig. 3. R2U2 Configuration Explorer web application: 1) C2PO specification input; 2) C2PO 
options; 3) C2PO output; 4) AST visualization; 5) AST node data; 6) R2U2 instruction; 7) C 
engine speed and memory calculator; 8) FPGA speed and size calculator; 9) FPGA design size 
vs maximum timestamp value. 
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3.1 C2PO Feedback 


Feedback from C2PO (elements 1—6 in Fig. 3) allows users to visualize the intermediate 
representation of a given input specification as well as the effects of optimizations and 
options on their final R2U2 configurations. Properties such as the memory required to 
represent specifications with differently-sized temporal intervals, or syntactically dif- 
ferent but functionally similar checks, can be unintuitive for users to compute on the 
fly. The AST visualization provides transparency into this process for users unfamiliar 
with R2U2’s implementation via an interactive web-based interface suited to experi- 
mentation with different variations of a possible specification. 


3.2 Software Resource Calculator 


The software resource calculator (element 7 in Fig. 3) provides users of the R2U2 soft- 
ware implementations with an estimate of the time and memory required to evaluate 
one time step of a specification in the worst case. 


Software Worst-Case Execution Time. The highly optimized nature of R2U2’s soft- 
ware implementations makes runtime performance highly dependent on the target plat- 
form’s architecture, C/C++ compiler version, and make environment factors; e.g., the 
length of the current working directory name can impact cache alignment. We use a 
simplified computing model to provide an estimation of the computing speed based on 
the number of CPU cycles required for each operation on the target platform. Users can 
edit these clock cycle values in the GUI, e.g., to test for platform-specific latencies. The 
estimated worst-case execution time (WCET) in software Wsw of an AST node g is: 


Wew(g) = X (Wew(c)) + Cycles(g.type) (1) 
cECy 


where C, are the children nodes of g and Cycles is a dictionary mapping AST node 
types to a corresponding number of clock cycles. For instance, Cycles(/) = 10 cycles 
by default. 


Software Memory Requirements. R2U2 uses Shared Connection Queues (SCQs) to 
store verdict-timestamp pairs for each node in the AST. SCQs are single-writer, many- 
reader circular buffers that buffer the results of dependent temporal expressions that 
might not be evaluated at the same timestamp. The total SCQ size for a specification 
is the total number of SCQ slots required by the specification multiplied by the size of 
one slot. The required number of SCQ slots for a node g is: 


size(g.Queue) = max(max{s.wpd | Vs € Sg} — g.bpd,0) + 1 (2) 


where g.Queue is the output SCQ of g, s.wpd is the worst-case propagation delay of 
node s, s.bpd is the best-case propagation delay of node s, and S, is the set of sibling 
nodes of g. The propagation delays of a node represent the minimum and maximum 
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number of time steps needed to evaluate the node and are defined recursively in Defini- 
tion 4 of [18]. Intuitively, a node requires enough memory such that its results will not 
be overwritten before they are consumed by a parent node. The total SCQ memory of 
an AST is the sum of the sizes of SCQs of all nodes in the AST. 

SCQ memory is an estimation of the actual total memory usage, but is typically the 
largest and most constraining memory type, e.g., as compared to instruction or pointer 
memory. The R2U2 C implementation statically fixes all memory sizes in advance to 
avoid dynamic allocation, so the SCQ sizing feedback is useful for: (1) selecting an 
initial size based on expected usage and; (2) verifying a configuration will fit on a 
deployed monitor with a fixed SCQ limit. 


3.3 Hardware Resource Calculator 


The hardware resource calculator (elements 8 — 9 in Fig.3) provides estimations for 
hardware WCET (Whw), total SCQ memory slots, and a graph for visualizing estimated 
FPGA resource requirements - Look-Up Tables (LUT) and Block RAMs (BRAM). 
Required resources depend on the type of FPGA architecture. The GUI accepts clock 
rate, LUT-type, timestamp length, and node sizing as parameters to better match the 
estimate to a target platform. This approach was validated on Virtex-5 and Zynq7000 
FPGA platforms as well as the ACTEL ProASIC3L used for Robonaut2 in [18]. 


Hardware Worst-Case Execution Time. The GUI computes the estimated W7,,,, using 
a more precise method than in Sect. 3.2 by taking into account SCQ usage during execu- 
tion. The R2U2 hardware implementation’s estimated worst-case execution time (Whw) 
of an AST node g is: 


Whw(g) = 5 (Wrw(c)) + Latencyinit(g.type) 
cECg 


+ Latencyevai(g.type) * 5 (size(c.Queue)) 
cECy 


(3) 


where Latencyiniz, Latencyeyai are dictionaries mapping AST node types to micro- 
second latencies corresponding to the initial and evaluation times of the node respec- 
tively. The multiplication accounts for evaluation of each buffered input from the child 
node, up to the queue size in the worst case. 


Hardware Memory Requirements. The hardware resource calculator provides the 
explicit number of SCQ slots required for the collection of specifications in the specifi- 
cation set (aka configuration) using Formula 2 and summing sizes required for all AST 
nodes. 

FPGAs use BRAMs to implement an R2U2 monitor’s SCQ memory, where the size 
and number of ports of the BRAMs limit the queue depth of the BRAMs. To compute 
the required number of BRAMs, let d be the total SCQ size, w be the bit width of each 
verdict-timestamp pair, Wmaz be the widest bit width the BRAM can accommodate, 
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and D(w) be the maximum queue depth of a BRAM with verdict-timestamp pair bit 
width w. The required number of cascaded BRAMsS is: 


d 
D(rem(w, Wmax)) 


Neram(w,d) = | ] * mod(w, Wmax) + [ (4) 


D(Wmaz) 


Hardware LUT Requirements. Each R2U2 operator requires a constant number of 
comparator and adder/subtractor LUTs, configured by the user in the GUI. The GUI 
accounts for scaling based on the LUT type and uses the bit width of each verdict- 
timestamp pair w to estimate total LUT usage. The total number of required comparator 
LUTs (Nemp) and adder/subtractor LUTs (Naqa) are: 


4xw if LUT3 
Nemp(w)=%2*w ifLUT-4 = Naaa(w) = 
w if LUT-6 


2*w if LUT-3 or LUT-4 
w if LUT-6 


4 Runtime Engine Improvements 


To better serve mission-critical systems that must satisfy strict flight certification 
requirements (such as NASA’s VSM [8-10]), we have made a number of improvements 
to the internal architecture of the C version of R2U2 that provide memory assurances 
and flexibility as well as extended computational abilities. Figure 4 depicts this updated 
architecture. 


Static Memory Arenas. The R2U2 V3.0C version uses only statically-allocated mem- 
ory. This avoids the many pitfalls of allocating memory (slow allocator calls, fragmenta- 
tion, leaks, out-of-memory errors, etc.) and guarantees the amount of memory required 
for the entire execution of R2U2 up front. Additionally, many mission-critical systems 
either do not have or do not permit dynamic memory allocation, e.g., to satisfy require- 
ments for flight certification [32]. R2U2 now runs unmodified on these platforms as 
well as traditional systems. 

Each type of memory (yellow boxes of Fig. 4) has a predefined “arena” with a max- 
imum size set during integration of the monitor with the target platform. When a user 
loads an R2U2 configuration, R2U2 fills the slots of these arenas in sequence until the 
arena is full. 


Monitor Type Parameterization. Complimentary to the switch to static memory, the 
internals of the reasoning engine are now fully parameterized. A single header file 
allows users to adjust maximum values, bit widths, and even internal types. Proper 
tuning has performance benefits, but crucially allows users to fit R2U2 to use the exact 
amounts of resources available on a target system. For example, limiting the size of 
the gaps between timestamps, e.g., in cases where the specification will be either reset 
frequently or evaluated infrequently, allows more SCQs to fit in the same amount of 
memory permitting larger formula sets with functionally similar behavior. 
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AT Checker 


R2U2 i Temporal Logic 


or 


Configuration A 
8 Booleanizer 


Engine 


Fig. 4. Internal architecture of an R2U2 monitor. Orange boxes are streams of data, yellow boxes 
are memory arenas, and blue boxes are modules. Arrows entering and exiting blue boxes denote 
read and write relationships respectively. The red arrows denote relationships that are only active 
upon startup i.e., when R2U2 populates instruction memory and configures SCQ memory. (Color 
figure online) 


Arbitrary Data Flow. R2U2 initially worked as a stack of engines, at each timestamp 
passing results from the Atomic Checker (AT) to the Temporal Logic engine (TL), then 
passing the TL verdicts through the Bayesian Network (BN) layer to produce that time- 
stamp’s verdict [34]. Now, R2U2 can connect these engines in any order. This simplifies 
configuration generation from the perspective of C2PO, enabling arbitrary ordering of 
instructions. Atomic checker properties can now accept results of temporal logic formu- 
las as input, for example, without adding a confusing step delay in the verdict stream. 


AT Checker Extended Mode. The C version of the atomic checker has an extended 
mode allowing for additional comparisons and filters beyond the standard hardware- 
compatible set. In extended mode, the atomic checker produces Boolean “atomics” 
from conditionals, where each conditional compares the result of a filter to either a 
constant or another input signal. Filters are predefined functions such as simple data 
type casts (bool, int, float, etc.) or mathematical functions like rate, moving average, or 
absolute angle difference. For example: 


e a5 := abs_diff_angle(s3,105) < 50; checks if the absolute difference 
between the data of signal 3 and the value 105 when treated as angles is below 50. 
e a43 := int(s32) == s33; checks that the values of signals 32 and 33 are in 


agreement when treated as integers. 


Booleanizer. The R2U2 V3.0C implementation includes a new general-purpose com- 
puting module that uses a three-address code representation [7] called the Booleanizer 
that can take the place of the AT checker. This module enables arbitrary expressions 
over non-Boolean data types using arithmetic, bitwise, and relational operators as well 
as extended set aggregation operators such as “forexactlyn” or “foratmostn” operators. 
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5 Discussion 


R2U2’s toolchain now provides an effective means by which to formalize, validate, 
and verify system requirements in real time, giving users control and transparency of 
the memory and feature set of their target-specific monitors. We have combined the 
collection of capabilities from previously-published R2U2 case studies into one modu- 
lar, centralized implementation that we have rigorously evaluated for correctness (e.g., 
using [19,38]). 

C2PO and its new specification language enable higher-level abstractions for users 
that make the specification development process faster, more transparent, and less 
reliant on a deep understanding of R2U2’s underlying algorithms. The new GUI front- 
end allows up-front specification design and resource usage estimation by system 
designers so that users can rapidly prototype specifications before downloading and 
using R2U2. These improvements make specifying, validating, and monitoring system 
requirements easier and more accessible to the systems that stand to benefit most from 
RV. Since specification is the biggest bottleneck to formal methods and autonomy [37], 
this is an important feature for an RV engine. 

It is now much easier to integrate R2U2 into production environments, like NASA 
cFS/cFE [25,28] or ROS [31], due to the unified front end compiler, expanded engine 
capabilities, and better user tooling. Recently R2U2 has launched on several real-life, 
full-scale air and space missions, largely enabled by these advancements. This major 
upgrade lays a solid foundation for expanded RV capabilities and integration into a 
wider array of missions and embedded architectures. 
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